diff --git a/quick_start_demo/chunked_databricks_docs.jsonl b/quick_start_demo/chunked_databricks_docs.jsonl new file mode 100644 index 0000000..c5255e2 --- /dev/null +++ b/quick_start_demo/chunked_databricks_docs.jsonl @@ -0,0 +1,500 @@ +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL database tool\n##### Databricks SQL CLI\n\nNote \nThis article covers the Databricks SQL CLI, which is provided as-is and is not supported by Databricks through customer technical support channels. Questions and feature requests can be communicated through the [Issues](https://github.com/databricks/databricks-sql-cli/issues) page of the [databricks/databricks-sql-cli](https://github.com/databricks/databricks-sql-cli) repo on GitHub. \nThe Databricks SQL command line interface ([Databricks SQL CLI](https://github.com/databricks/databricks-sql-cli)) enables you to run SQL queries on your existing Databricks SQL [warehouses](https://docs.databricks.com/compute/sql-warehouse/index.html) from your terminal or Windows Command Prompt instead of from locations such as the Databricks SQL editor or a Databricks notebook. From the command line, you get productivity features such as suggestions and syntax highlighting.\n\n", "chunk_id": "8cd5aec21d6f7b6c42d8aac20991eab3", "url": "https://docs.databricks.com/dev-tools/databricks-sql-cli.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL database tool\n##### Databricks SQL CLI\n###### Requirements\n\n* At least one Databricks SQL [warehouse](https://docs.databricks.com/compute/sql-warehouse/index.html). [Create a warehouse](https://docs.databricks.com/compute/sql-warehouse/create.html), if you do not already have one.\n* [Python](https://www.python.org/) 3.7 or higher. To check whether you have Python installed, run the command `python --version` from your terminal or Command Prompt. (On some systems, you may need to enter `python3` instead.) [Install Python](https://www.python.org/downloads/), if you do not have it already installed.\n* [pip](https://pip.pypa.io/), the package installer for Python. Newer versions of Python install `pip` by default. To check whether you have `pip` installed, run the command `pip --version` from your terminal or Command Prompt. (On some systems, you may need to enter `pip3` instead.) [Install pip](https://pip.pypa.io/en/stable/installation/), if you do not have it already installed.\n* (Optional) A utility for creating and managing Python [virtual environments](https://packaging.python.org/en/latest/tutorials/installing-packages/#creating-and-using-virtual-environments), such as [venv](https://packaging.python.org/en/latest/key_projects/#pipenv). Virtual environments help to ensure that you are using the correct versions of Python and the Databricks SQL CLI together. Setting up and using virtual environments is outside of the scope of this article. For more information, see [Creating Virtual Environments](https://packaging.python.org/en/latest/tutorials/installing-packages/#creating-and-using-virtual-environments).\n\n", "chunk_id": "512e5e04e3eaa0215bbc6649c912fe90", "url": "https://docs.databricks.com/dev-tools/databricks-sql-cli.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL database tool\n##### Databricks SQL CLI\n###### Install the Databricks SQL CLI\n\nAfter you meet the [requirements](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#requirements), install the [Databricks SQL CLI](https://pypi.org/project/databricks-sql-cli) package from the Python Packaging Index (PyPI). You can use `pip` to install the Databricks SQL CLI package from PyPI by running `pip` with one of the following commands. \n```\npip install databricks-sql-cli\n\n# Or...\n\npython -m pip install databricks-sql-cli\n\n``` \nTo upgrade a previously installed version of the Databricks SQL CLI, run `pip` with one of the following commands. \n```\npip install databricks-sql-cli --upgrade\n\n# Or...\n\npython -m pip install databricks-sql-cli --upgrade\n\n``` \nTo check your installed version of the Databricks SQL CLI run `pip` with one of the following commands. \n```\npip show databricks-sql-cli\n\n# Or...\n\npython -m pip show databricks-sql-cli\n\n```\n\n", "chunk_id": "71ba3110b1390c829096342fd64a5caf", "url": "https://docs.databricks.com/dev-tools/databricks-sql-cli.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL database tool\n##### Databricks SQL CLI\n###### Authentication\n\nTo authenticate, you must provide the Databricks SQL CLI with your warehouse\u2019s [connection details](https://docs.databricks.com/integrations/compute-details.html). Specifically, you need the **Server hostname** and **HTTP path** values. You must also product the Databricks SQL CLI with the proper authentication credentials. \nThe Databricks SQL CLI supports two Databricks authentication types: [Databricks personal access token authentication](https://docs.databricks.com/dev-tools/auth/pat.html) and, for Databricks SQL CLI versions 0.2.0 and above, [OAuth user-to-machine (U2M) authentication](https://docs.databricks.com/dev-tools/auth/oauth-u2m.html). \nTo use Databricks personal access token authentication, create a personal access token as follows: \n1. In your Databricks workspace, click your Databricks username in the top bar, and then select **Settings** from the drop down.\n2. Click **Developer**.\n3. Next to **Access tokens**, click **Manage**.\n4. Click **Generate new token**.\n5. (Optional) Enter a comment that helps you to identify this token in the future, and change the token\u2019s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the **Lifetime (days)** box empty (blank).\n6. Click **Generate**.\n7. Copy the displayed token to a secure location, and then click **Done**. \nNote \nBe sure to save the copied token in a secure location. Do not share your copied token with others. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, or you believe that the token has been compromised, Databricks strongly recommends that you immediately delete that token from your workspace by clicking the trash can (**Revoke**) icon next to the token on the **Access tokens** page. \nIf you are not able to create or use tokens in your workspace, this might be because your workspace administrator has disabled tokens or has not given you permission to create or use tokens. See your workspace administrator or the following: \n* [Enable or disable personal access token authentication for the workspace](https://docs.databricks.com/admin/access-control/tokens.html#enable-tokens)\n* [Personal access token permissions](https://docs.databricks.com/security/auth-authz/api-access-permissions.html#pat) \nThere are no setup requirements to use OAuth U2M authentication. \nYou can provide this authentication information to the Databricks SQL CLI in several ways: \n* In the `dbsqlclirc` settings file in its default location (or by specifying an alternate settings file through the `--clirc` option each time you run a command with the Databricks SQL CLI). See [Settings file](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#settings-file).\n* For Databricks personal access token authentication, by setting the `DBSQLCLI_HOST_NAME`, `DBSQLCLI_HTTP_PATH` and `DBSQLCLI_ACCESS_TOKEN` environment variables. See [Environment variables](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#environment-variables).\n* For Databricks OAuth U2M authentication, by setting the `DBSQLCLI_HOST_NAME` and `DBSQLCLI_HTTP_PATH` environment variables, and specifying the `--oauth` command-line option or setting `auth_type = \"databricks-oauth\"` in the `dbsqlclirc` settings file. See [Environment variables](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#environment-variables).\n* For Databricks personal access token authentication, by specifying the `--hostname`, `--http-path`, and `--access-token` options each time you run a command with the Databricks SQL CLI. See [Command options](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#command-options).\n* For Databricks OAuth U2M authentication, by specifying the `--hostname` and `--http-path` command-line options, and specifying the `--oauth` command-line option or setting `auth_type = \"databricks-oauth\"` in the `dbsqlclirc` settings file, each time you run a command with the Databricks SQL CLI. See [Command options](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#command-options). \nNote \nThe `dbsqlclirc` settings file must be present, even if you set the preceding environment variables or specify the preceding command options or both. \nWhenever you run the Databricks SQL CLI, it looks for authentication details in the following order and stops when it finds the first set of details: \n1. The `--hostname`, `--http-path`, and `--access-token` or `--oauth` options.\n2. The `DBSQLCLI_HOST_NAME` and `DBSQLCLI_HTTP_PATH` environment variables (and, for Databricks personal access token authentication, the `DBSQLCLI_ACCESS_TOKEN` environment variable).\n3. The `dbsqlclirc` settings file in its default location (or an alternate settings file specified by the `--clirc` option). \n### Settings file \nTo use the `dbsqlclirc` settings file to provide the Databricks SQL CLI with authentication details for your Databricks SQL warehouse, run the Databricks SQL CLI for the first time, as follows: \n```\ndbsqlcli\n\n``` \nThe Databricks SQL CLI creates a settings file for you, at `~/.dbsqlcli/dbsqlclirc` on Unix, Linux, and macOS, and at `%HOMEDRIVE%%HOMEPATH%\\.dbsqlcli\\dbsqlclirc` or `%USERPROFILE%\\.dbsqlcli\\dbsqlclirc` on Windows. To customize this file: \n1. Use a text editor to open and edit the `dbsqlclirc` file.\n2. Scroll to the following section: \n```\n# [credentials]\n# host_name = \"\"\n# http_path = \"\"\n# access_token = \"\"\n\n```\n3. Remove the four `#` characters, and: \n1. Next to `host_name`, enter your warehouse\u2019s **Server hostname** value from the requirements between the `\"\"` characters.\n2. Next to `http_path`, enter your warehouse\u2019s **HTTP path** value from the requirements between the `\"\"` characters.\n3. Next to `access_token`, enter your personal access token value from the requirements between the `\"\"` characters. \nNote \nFor Databricks OAuth U2M authentication, you must replace `access_token` with `auth_type = \"databricks-oauth\"`, or specify the `--oauth` command-line option with every call to the Databricks SQL CLI.For example: \n```\n[credentials]\nhost_name = \"dbc-a1b2345c-d6e78.cloud.databricks.com\"\nhttp_path = \"/sql/1.0/warehouses/1abc2d3456e7f890a\"\naccess_token = \"dapi12345678901234567890123456789012\"\n\n```\n4. Save the `dbsqlclirc` file. \nAlternatively, instead of using the `dbsqlclirc` file in its default location, you can specify a file in a different location by adding the `--clirc` command option and the path to the alternate file. That alternate file\u2019s contents must conform to the preceding syntax. \n### Environment variables \nTo use the `DBSQLCLI_HOST_NAME` and `DBSQLCLI_HTTP_PATH` environment variables (and, for Databricks personal access token authentication, the `DBSQLCLI_ACCESS_TOKEN` environment variable) to provide the Databricks SQL CLI with authentication details for your Databricks SQL warehouse, do the following. \nNote \nFor Databricks OAuth U2M authentication, you must set `auth_type = \"databricks-oauth\"` in the `dbsqlclirc` settings file, or specify the `--oauth` command option with every call to the Databricks SQL CLI. \nTo set the environment variables for only the current terminal session, run the following commands. To set the environment variables for all terminal sessions, enter the following commands into your shell\u2019s startup file and then restart your terminal. In the following commands, replace the value of: \n* `DBSQLCLI_HOST_NAME` with your warehouse\u2019s **Server hostname** value from the requirements.\n* `DBSQLCLI_HTTP_PATH` with your warehouse\u2019s **HTTP path** value from the requirements.\n* `DBSQLCLI_ACCESS_TOKEN` with your personal access token value from the requirements. \n```\nexport DBSQLCLI_HOST_NAME=\"dbc-a1b2345c-d6e78.cloud.databricks.com\"\nexport DBSQLCLI_HTTP_PATH=\"/sql/1.0/warehouses/1abc2d3456e7f890a\"\nexport DBSQLCLI_ACCESS_TOKEN=\"dapi12345678901234567890123456789012\"\n\n``` \nTo set the environment variables for only the current Command Prompt session, run the following commands, replacing the value of: \n* `DBSQLCLI_HOST_NAME` with your warehouse\u2019s **Server hostname** value from the requirements.\n* `DBSQLCLI_HTTP_PATH` with your warehouse\u2019s **HTTP path** value from the requirements.\n* `DBSQLCLI_ACCESS_TOKEN` with your personal access token value from the requirements.: \n```\nset DBSQLCLI_HOST_NAME=\"dbc-a1b2345c-d6e78.cloud.databricks.com\"\nset DBSQLCLI_HTTP_PATH=\"/sql/1.0/warehouses/1abc2d3456e7f890a\"\nset DBSQLCLI_ACCESS_TOKEN=\"dapi12345678901234567890123456789012\"\n\n``` \nTo set the environment variables for all Command Prompt sessions, run the following commands and then restart your Command Prompt, replacing the value of: \n* `DBSQLCLI_HOST_NAME` with your warehouse\u2019s **Server hostname** value from the requirements.\n* `DBSQLCLI_HTTP_PATH` with your warehouse\u2019s **HTTP path** value from the requirements.\n* `DBSQLCLI_ACCESS_TOKEN` with your personal access token value from the requirements. \n```\nsetx DBSQLCLI_HOST_NAME \"dbc-a1b2345c-d6e78.cloud.databricks.com\"\nsetx DBSQLCLI_HTTP_PATH \"/sql/1.0/warehouses/1abc2d3456e7f890a\"\nsetx DBSQLCLI_ACCESS_TOKEN \"dapi12345678901234567890123456789012\"\n\n``` \n### Command options \nTo use the `--hostname`, `--http-path`, and `--access-token` or `--oauth` options to provide the Databricks SQL CLI with authentication details for your Databricks SQL warehouse, do the following: \nDo the following every time you run a command with the Databricks SQL CLI: \n* Specify the `--hostname` option and your warehouse\u2019s **Server hostname** value from the requirements.\n* Specify the `--http-path` option and your warehouse\u2019s **HTTP path** value from the requirements.\n* For Databricks personal access token authentication, specify the `--access-token` option and your personal access token value from the requirements.\n* For Databricks OAuth U2M authentication, specify `--oauth`. \nNote \nFor Databricks OAuth U2M authentication, you must specify the `auth_type = \"databricks-oauth\"` in the `dbsqlclirc` settings file, or specify the `--oauth` command option with every call to the Databricks SQL CLI. \nFor example: \nFor Databricks personal access token authentication: \n```\ndbsqlcli -e \"SELECT * FROM default.diamonds LIMIT 2\" \\\n--hostname \"dbc-a1b2345c-d6e78.cloud.databricks.com\" \\\n--http-path \"/sql/1.0/warehouses/1abc2d3456e7f890a\" \\\n--access-token \"dapi12345678901234567890123456789012\"\n\n``` \nFor Databricks OAuth U2M authentication: \n```\ndbsqlcli -e \"SELECT * FROM default.diamonds LIMIT 2\" \\\n--hostname \"dbc-a1b2345c-d6e78.cloud.databricks.com\" \\\n--http-path \"/sql/1.0/warehouses/1abc2d3456e7f890a\" \\\n--oauth\n\n```\n\n", "chunk_id": "b047db13f251967bd5c1f85c6358f880", "url": "https://docs.databricks.com/dev-tools/databricks-sql-cli.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL database tool\n##### Databricks SQL CLI\n###### Query sources\n\nThe Databricks SQL CLI enables you to run queries in the following ways: \n* From a [query string](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#query-string).\n* From a [file](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#file).\n* In a read-evaluate-print loop ([REPL](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#repl)) approach. This approach provides suggestions as you type. \n### Query string \nTo run a query as a string, use the `-e` option followed by the query, represented as a string. For example: \n```\ndbsqlcli -e \"SELECT * FROM default.diamonds LIMIT 2\"\n\n``` \nOutput: \n```\n_c0,carat,cut,color,clarity,depth,table,price,x,y,z\n1,0.23,Ideal,E,SI2,61.5,55,326,3.95,3.98,2.43\n2,0.21,Premium,E,SI1,59.8,61,326,3.89,3.84,2.31\n\n``` \nTo switch output formats, use the `--table-format` option along with a value such as `ascii` for ASCII table format, for example: \n```\ndbsqlcli -e \"SELECT * FROM default.diamonds LIMIT 2\" --table-format ascii\n\n``` \nOutput: \n```\n+-----+-------+---------+-------+---------+-------+-------+-------+------+------+------+\n| _c0 | carat | cut | color | clarity | depth | table | price | x | y | z |\n+-----+-------+---------+-------+---------+-------+-------+-------+------+------+------+\n| 1 | 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |\n| 2 | 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |\n+-----+-------+---------+-------+---------+-------+-------+-------+------+------+------+\n\n``` \nFor a list of available output format values, see the comments for the `table_format` setting in the `dbsqlclirc` file. \n### File \nTo run a file that contains SQL, use the `-e` option followed by the path to a `.sql` file. For example: \n```\ndbsqlcli -e my-query.sql\n\n``` \nContents of the example `my-query.sql` file: \n```\nSELECT * FROM default.diamonds LIMIT 2;\n\n``` \nOutput: \n```\n_c0,carat,cut,color,clarity,depth,table,price,x,y,z\n1,0.23,Ideal,E,SI2,61.5,55,326,3.95,3.98,2.43\n2,0.21,Premium,E,SI1,59.8,61,326,3.89,3.84,2.31\n\n``` \nTo switch output formats, use the `--table-format` option along with a value such as `ascii` for ASCII table format, for example: \n```\ndbsqlcli -e my-query.sql --table-format ascii\n\n``` \nOutput: \n```\n+-----+-------+---------+-------+---------+-------+-------+-------+------+------+------+\n| _c0 | carat | cut | color | clarity | depth | table | price | x | y | z |\n+-----+-------+---------+-------+---------+-------+-------+-------+------+------+------+\n| 1 | 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |\n| 2 | 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |\n+-----+-------+---------+-------+---------+-------+-------+-------+------+------+------+\n\n``` \nFor a list of available output format values, see the comments for the `table_format` setting in the `dbsqlclirc` file. \n### REPL \nTo enter read-evaluate-print loop (REPL) mode scoped to the default database, run the following command: \n```\ndbsqlcli\n\n``` \nYou can also enter REPL mode scoped to a specific database, by running the following command: \n```\ndbsqlcli \n\n``` \nFor example: \n```\ndbsqlcli default\n\n``` \nTo exit REPL mode, run the following command: \n```\nexit\n\n``` \nIn REPL mode, you can use the following characters and keys: \n* Use the semicolon (`;`) to end a line.\n* Use **F3** to toggle multiline mode.\n* Use the spacebar to show suggestions at the insertion point, if suggestions are not already displayed.\n* Use the up and down arrows to navigate suggestions.\n* Use the right arrow to complete the highlighted suggestion. \nFor example: \n```\ndbsqlcli default\n\nhostname:default> SELECT * FROM diamonds LIMIT 2;\n\n+-----+-------+---------+-------+---------+-------+-------+-------+------+------+------+\n| _c0 | carat | cut | color | clarity | depth | table | price | x | y | z |\n+-----+-------+---------+-------+---------+-------+-------+-------+------+------+------+\n| 1 | 0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |\n| 2 | 0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |\n+-----+-------+---------+-------+---------+-------+-------+-------+------+------+------+\n\n2 rows in set\nTime: 0.703s\n\nhostname:default> exit\n\n```\n\n", "chunk_id": "326b42cd7d983ec095af9affc6738fa3", "url": "https://docs.databricks.com/dev-tools/databricks-sql-cli.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL database tool\n##### Databricks SQL CLI\n###### Logging\n\nThe Databricks SQL CLI logs its messages to the file `~/.dbsqlcli/app.log` by default. To change this file name or location, change the value of the `log_file` setting in the `dbsqlclirc` [settings file](https://docs.databricks.com/dev-tools/databricks-sql-cli.html#settings-file). \nBy default, messages are logged at the `INFO` log level and below. To change this log level, change the value of the `log_level` setting in the `dbsqlclirc` settings file. Available log level values include `CRITICAL`, `ERROR`, `WARNING`, `INFO`, and `DEBUG` and are evaluated in that order. `NONE` disables logging.\n\n##### Databricks SQL CLI\n###### Additional resources\n\n* [Databricks SQL CLI README](https://github.com/databricks/databricks-sql-cli/blob/main/README.md) \n* [Databricks SQL Statement Execution API tutorial](https://docs.databricks.com/dev-tools/sql-execution-tutorial.html)\n\n", "chunk_id": "173990b50ed71422558365067af67484", "url": "https://docs.databricks.com/dev-tools/databricks-sql-cli.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage users\n### service principals\n#### and groups\n##### Sync users and groups from your identity provider\n####### Configure SCIM provisioning for Okta\n\nThis article describes how to set up Databricks provisioning using Okta. Your Okta tenant must be using the Okta Lifecycle Management feature in order to provision users and groups to Databricks. \nYou can set set up provisioning at the Databricks account level or at the Databricks workspace level. \nDatabricks recommends that you provision users, service principals, and groups to the account level and assign users and groups to workspaces using [identity federation](https://docs.databricks.com/admin/users-groups/index.html#enable-identity-federation). If you have any workspaces not enabled for identity federation, you must continue to provision users, service principals, and groups directly to those workspaces. \nTo learn more about SCIM provisioning in Databricks, including an explanation of the impact of identity federation on provisioning and advice about when to use account-level and workspace-level provisioning, see [Sync users and groups from your identity provider](https://docs.databricks.com/admin/users-groups/scim/index.html). \nFor a user to log in using Okta, you must configure single sign-on from Okta to Databricks. To configure single sign-on see, [SSO in your Databricks account console](https://docs.databricks.com/admin/account-settings-e2/single-sign-on/index.html).\n\n####### Configure SCIM provisioning for Okta\n######## Features\n\nDatabricks is available as a provisioning app in the Okta Integration Network (OIN), enabling you to use Okta to provision users and groups with Databricks automatically. \nThe Databricks Okta application allows you to: \n* Invite users to a Databricks account or workspace\n* Add invited or active users to groups\n* Deactivate existing users in a Databricks account or workspace\n* Manage groups and group membership\n* Update and manage profiles\n\n", "chunk_id": "7768e1d7c71ad9e76526c09d53206d94", "url": "https://docs.databricks.com/admin/users-groups/scim/okta.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage users\n### service principals\n#### and groups\n##### Sync users and groups from your identity provider\n####### Configure SCIM provisioning for Okta\n######## Requirements\n\n* Your Databricks account must have the [Premium plan or above](https://databricks.com/product/pricing/platform-addons).\n* You must be an Okta developer user.\n* To set up provisioning for your Databricks account, you must be Databricks account admin.\n* To set up provisioning for a Databricks workspace, you must be Databricks workspace admin. \n* Configure single sign-on for users to log in to Databricks using Okta. See [SSO in your Databricks account console](https://docs.databricks.com/admin/account-settings-e2/single-sign-on/index.html).\n\n", "chunk_id": "c26f9a9aa9f390fe55ee31bf5a4c4db1", "url": "https://docs.databricks.com/admin/users-groups/scim/okta.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage users\n### service principals\n#### and groups\n##### Sync users and groups from your identity provider\n####### Configure SCIM provisioning for Okta\n######## Set up account-level SCIM provisioning using Okta\n\nThis section describes how to configure an Okta SCIM connector to provision users and groups to your account. \n### Get the SCIM token and account SCIM URL in Databricks \n1. As an account admin, log in to the Databricks [account console](https://accounts.cloud.databricks.com). \n1. Click ![User Settings Icon](https://docs.databricks.com/_images/user-settings-icon.png) **Settings**.\n2. Click **User Provisioning**.\n3. Click **Set up user provisioning**. \nCopy the SCIM token and the Account SCIM URL. You will use these to configure your connector in Okta. \nNote \nThe SCIM token is restricted to the Account SCIM API `/api/2.0/accounts/{account_id}/scim/v2/` and cannot be used to authenticate to other Databricks REST APIs. \n### Configure SCIM provisioning in Okta \n1. Log in to the Okta admin portal.\n2. Go to **Applications** and click **Browse App Catalog**.\n3. Search for Databricks in the **Browse App Integration Catalog**.\n4. Click **Add integration**.\n5. In **Add Databricks** configure the following: \n* In **Application label**, enter a name for your application.\n* Select **Do not display application icon to users**.\n* Select **Do not display application icon in the Okta Mobile App**.\n6. Click **Done**.\n7. Click **Provisioning** and enter the following: \n* In **Provisioning Base URL**, enter the SCIM URL you copied from Databricks.\n* In **Provisioning API Token**, enter the SCIM token you copied from Databricks.\n8. Click **Test API Credentials**, verify the connection was successful, and then click **Save**.\n9. Reload the **Provisioning** tab. Additional settings appear after a successful test of the API credentials.\n10. To configure the behavior when pushing Okta changes to Databricks, click **Provisioning to App**. \n* Click **Edit**. Enable the features you need. Databricks recommends enabling **Create users**, **Update user attributes**, and **Deactivate users**.\n* In **Databricks Attribute Mappings**, verify your Databricks Attribute Mappings. These mappings will depend on the options you enabled above. You can add and edit mappings to fit your needs. See [Map application attributes on the Provisioning page](https://help.okta.com/en/prod/Content/Topics/users-groups-profiles/usgp-map-attributes-provisioning.htm) in the Okta documentation.\n11. To configure the behavior when pushing Databricks changes to Okta, click **To Okta**. The default settings work well for Databricks provisioning. If you want to update the default settings and attribute mappings, see [Provisioning and Deprovisioning](https://help.okta.com/en/prod/Content/Topics/Apps/Provisioning_Deprovisioning_Overview.htm) in the Okta documentation. \n### Test the integration \nTo test the configuration, use Okta to invite a user to your Databricks account. \n1. In Okta, go to **Applications** and click **Databricks**.\n2. Click **Provisioning**.\n3. Click **Assign**, then **Assign to people**.\n4. Search for an Okta user, and click **Assign**.\n5. Confirm the user\u2019s details, click **Assign and go back**, and then click **Done**.\n6. Log in to the [account console](https://accounts.cloud.databricks.com), click ![Account Console user management icon](https://docs.databricks.com/_images/user-management.png) **User management**, and then confirm that the user has been added. \nAfter this simple test, you can perform bulk operations as described in [Use Okta to manage users and groups in Databricks](https://docs.databricks.com/admin/users-groups/scim/okta.html#manage-users).\n\n", "chunk_id": "9e29e43953893fdeebf7309c03351dae", "url": "https://docs.databricks.com/admin/users-groups/scim/okta.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage users\n### service principals\n#### and groups\n##### Sync users and groups from your identity provider\n####### Configure SCIM provisioning for Okta\n######## Set up workspace-level SCIM provisioning using Okta (legacy)\n\nPreview \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \nThis section describes how to set up provisioning from Okta directly to Databricks workspaces. \n### Get the API token and SCIM URL in Databricks \n1. As a Databricks workspace administrator, generate a personal access token. See [Token management](https://docs.databricks.com/api/workspace/tokenmanagement). Store the personal access token in a secure location. \nImportant \nThe user who owns this personal access token must not be managed within Okta. Otherwise, removing the user from Okta would disrupt the SCIM integration.\n2. Make a note of the following URL, which is required for configuring Okta: \n`https:///api/2.0/preview/scim/v2` \nReplace `` with the [workspace URL](https://docs.databricks.com/workspace/workspace-details.html#workspace-url) of your Databricks deployment. See [Get identifiers for workspace objects](https://docs.databricks.com/workspace/workspace-details.html). \nKeep this browser tab open. \n### Configure SCIM provisioning in the Databricks SAML application in Okta \n1. Go to **Applications** and click **Databricks**.\n2. Click **Provisioning**. Enter the following information obtained from the above section: \n* **Provisioning Base URL:** the provisioning endpoint\n* **Provisioning API Token:** the personal access token\n3. Click **Test API Credentials**.\n4. Reload the **Provisioning** tab. Additional settings appear after a successful test of the API credentials.\n5. To configure the behavior when pushing Okta changes to Databricks, click **To App**. \n* In **General**, click **Edit**. Enable the features you need. Databricks recommends enabling **Create users** at a minimum.\n* In **Databricks Attribute Mappings**, verify your Databricks Attribute Mappings. These mappings will depend on the options you enabled above. You can add and edit mappings to fit your needs. See [Map application attributes on the Provisioning page](https://help.okta.com/en/prod/Content/Topics/users-groups-profiles/usgp-map-attributes-provisioning.htm) in the Okta documentation.\n6. To configure the behavior when pushing Databricks changes to Okta, click **To Okta**. The default settings work well for Databricks provisioning. If you want to update the default settings and attribute mappings, see [Provisioning and Deprovisioning](https://help.okta.com/en/prod/Content/Topics/Apps/Provisioning_Deprovisioning_Overview.htm) in the Okta documentation. \n### Test the integration \nTo test the configuration, use Okta to invite a user to your Databricks workspace. \n1. In Okta, go to **Applications** and click **Databricks**.\n2. Click the **Assign** tab, then **Assign to people**.\n3. Search for an Okta user, and click **Assign**.\n4. Confirm the user\u2019s details. Click **Done**.\n5. In the Databricks workspace admin settings page, click **Identity and access** tab, then go to the **Users** section and confirm that the user is added. At a minimum, grant the user the Workspace entitlement. \nAfter this simple test, you can perform bulk operations, as described in the following sections.\n\n", "chunk_id": "e01ca569b41093006f72e46b9d5ebb5a", "url": "https://docs.databricks.com/admin/users-groups/scim/okta.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage users\n### service principals\n#### and groups\n##### Sync users and groups from your identity provider\n####### Configure SCIM provisioning for Okta\n######## Use Okta to manage users and groups in Databricks\n\nThis section describes bulk operations you can perform using Okta SCIM provisioning to your Databricks account or workspaces. \n### Import users from the Databricks workspace to Okta \nTo import users from Databricks to Okta, go to the **Import** tab and click **Import Now**. You are prompted to review and confirm assignments for any users who are not automatically matched to existing Okta users by email address. \n### Add user and group assignments to your Databricks account \nTo verify or add user and group assignments, go to the **Assignments** tab. Databricks recommends adding the Okta group named **Everyone** to the account-level SCIM provisioning application. This syncs all users in your organization to the Databricks account. \n### Push groups to Databricks \nTo push groups from Okta to Databricks, go to the **Push Groups** tab. Users who already exist in Databricks are matched by email address. \n### Delete a user or group from the account \nIf you delete a user from the account-level Databricks application in Okta, the user is deleted in the Databricks account and loses access to all workspaces, whether or not those workspaces are enabled for identity federation. \nIf you delete a group from the account-level Databricks application in Okta, all users in that group are deleted from the account and lose access to any workspaces they had access to (unless they are members of another group or have been directly granted access to the account or any workspaces). Databricks recommends that you refrain from deleting account-level groups unless you want them to lose access to all workspaces in the account. \nBe aware of the following consequences of deleting users: \n* Applications or scripts that use the tokens generated by the user can no longer access Databricks APIs \n+ Jobs owned by the user fail\n+ Clusters owned by the user stop\n+ Queries or dashboards created by the user and shared using the Run as Owner credential have to be assigned to a new owner to prevent sharing from failing \n### Delete a deactivated user from the workspace \nIf you delete a user from the workspace-level Databricks application in Okta, the user is *deactivated* in the Databricks workspace but is not removed from the workspace. A deactivated user does not have the `workspace-access` or `databricks-sql-access` entitlement. Reactivating a deactivated user is reversible, either by re-adding the user in Okta or by using the Databricks SCIM API directly. Removing a user from a Databricks workspace is disruptive and non-reversible. \nImportant \nDo not deactivate the administrator who configured the Okta SCIM provisioning app. Otherwise, the SCIM integration cannot authenticate to Databricks. \nTo remove a user from a Databricks workspace: \n1. In the admin settings page, go to the **Users** tab.\n2. Click the **x** at the end of the line for the user. \nBe aware of the following consequences of removing the user: \n* Applications or scripts that use the tokens generated by the user will no longer be able to access the Databricks API\n* Jobs owned by the user will fail\n* Clusters owned by the user will stop\n* Queries or dashboards created by the user and shared using the Run as Owner credential will have to be assigned to a new owner to prevent sharing from failing\n\n", "chunk_id": "a074114733f63d8cf8225e5c7d273c84", "url": "https://docs.databricks.com/admin/users-groups/scim/okta.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage users\n### service principals\n#### and groups\n##### Sync users and groups from your identity provider\n####### Configure SCIM provisioning for Okta\n######## Use Okta to manage workspace admins, entitlements, and IAM roles\n\nDatabricks supports the assignment of workspace admins, IAM roles, and workspace entitlements from workspace-level Databricks applications in Okta. The assignment of roles and entitlements is not supported from the account-level Databricks application in Okta. If you want to assign IAM roles and workspace entitlements from Okta, you must create a workspace-level Databricks application in Okta to that workspace. \nDatabricks recommends that you instead use an account-level Databricks application in Okta to provision users, service principals, and groups to the account level. You assign users and groups to workspaces using [identity federation](https://docs.databricks.com/admin/users-groups/index.html#enable-identity-federation) and manage their entitlements and IAM roles within Databricks. \n### Sync workspace admins \nDatabricks supports the assignment of the workspace admin role from the workspace-level Databricks application in Okta. Workspace admins are members of the Databricks `admins` group. Databricks groups are automatically pushed to Okta. To add a new admin user in Okta, add that user to the `admins` group. \nImportant \nDo not remove the administrator who configured the Okta SCIM provisioning app, and do not remove them from the `admins` group. Otherwise, the SCIM integration cannot authenticate to Databricks. \n### Assign workspace entitlements from Okta \nDatabricks supports the assignment of entitlements from the workspace-level Databricks application in Okta. However, in most instances, Databricks recommends managing entitlements from within Databricks. Within Databricks, you can easily assign or revoke an entitlement. Configuring the mappings in Okta is complex, and you must configure two mappings for each entitlement. \nThis section describes how to configure the mappings to grant the `databricks-sql-access` entitlement to an Okta user. \nImportant \nBy default, Databricks users inherit the `workspace-access` and `databricks-sql-access` entitlements. By default, Databricks admin users inherit the `create-cluster` entitlement. You don\u2019t need to assign these inherited entitlements from Okta. \nTo revoke an inherited entitlement from a user, either remove the user from the group or remove the entitlement from the group. To remove an entitlement, you must use the Databricks admin console. \nTo assign the `databricks-sql-access` entitlement: \n1. In the Okta admin console, go to **Directory > Profile Editor**.\n2. Click the **Profile** edit button for the Okta user profile.\n3. Click the **+ Add Attribute** button to add a role.\n4. In the Add Attribute dialog, set the **Display name** to `Databricks SQL` and the **Variable name** to `databricks_sql`. \nNote \nOkta variables cannot contain the hyphen (`-`) character.\n5. Return to the Profile Editor and click the **Profile** edit button for the Databricks provisioning app user profile.\n6. Click the **+ Add Attribute** button to add a role.\n7. On the Add Attribute dialog, give the role attribute the following values: \n* **Display name**: `Databricks SQL`\n* **Variable name**: `databricks_sql`\n* **External Name** in the format `entitlements.^[type==\u2018$TYPE\u2019].value`. `$TYPE` is the [API name](https://docs.databricks.com/security/auth-authz/entitlements.html) of the entitlement without dashes (`-`). For example, the External Name for `databricks-sql-access` is `entitlements.^[type=='databrickssqlaccess'].value`.\nImportant \nIn the External Name format, you must use apostrophe characters (`'`). If you use curly quote characters (`\u2019`), a `Request is unparseable` error occurs. \n* **External Namespace**: `urn:ietf:params:scim:schemas:core:2.0:User`.\n![add Databricks role attribute](https://docs.databricks.com/_images/okta-add-databricks-attrib-role.png)\n8. Return to the Profile Editor and click the **Mappings** edit button for the Databricks provisioning app user profile.\n9. For **Databricks to Okta**, map `appuser.databricks_sql` in the Databricks column to `databricks_sql` in the Okta column.\n10. For **Okta to Databricks**, map `user.databricks_sql` in the Databricks column to `databricks_sql` in the Okta column.\n11. Click **Save Mappings**.\n12. To add an entitlement value to a user, go to **Directory > People**, select a user, and go to the **Profile** tab in the user page. \nClick the **Edit** button. In the field for the entitlement, enter the API name of the entitlement without dashes, such as `databrickssqlaccess`. When you assign the user to the app, the role is populated with the value that you entered. \nRepeat this procedure to assign additional entitlements. \n### Assign IAM roles from Okta \nIn order to assign IAM roles to users from Okta, you must create a multi-valued attribute in the Okta user profile and the Okta Databricks provisioning app profile, and then map these attributes to attributes in the Databricks SCIM API. For example, if you want to assign two IAM roles to a user, you must create two attributes in the Databricks provisioning app and map one Okta user attribute to each. \nDatabricks recommends managing IAM role assignments from within Databricks. Within Databricks, you can easily assign or revoke an IAM role. Configuring the mappings in Okta is complex, and you must configure separate mappings for each IAM role. \nThe following instructions assign the `primary_role` attribute. \n1. In the Okta admin console, go to **Directory > Profile Editor**.\n2. Click the **Profile** edit button for the Okta user profile.\n3. Click the **+ Add Attribute** button to add a role.\n4. In the Add Attribute dialog, set **Display name** to `Primary role` and **Variable name** to `primary_role`. \n![add okta role attribute](https://docs.databricks.com/_images/okta-add-attrib-role.png)\n5. Return to the Profile Editor and click the **Profile** edit button for the Databricks provisioning app user profile.\n6. Click the **+ Add Attribute** button to add a role.\n7. On the Add Attribute dialog, give the role attribute the following values: \n**Display name**: `Primary role` \n**Variable name**: `primary_role` \n**External Name** in the format `roles.^[type=='$TYPE'].value`, where `$TYPE` is a string describing the role; in this case, if $TYPE were `primary`, the External Name would be roles.^[type==\u2019primary\u2019].value. \nImportant \nIn the External Name format, you must use apostrophe characters (`'`). If you use curly quote characters (`\u2019`), a `Request is unparseable` error occurs. \n**External Namespace**: `urn:ietf:params:scim:schemas:core:2.0:User`. \n![add Databricks role attribute](https://docs.databricks.com/_images/okta-add-databricks-attrib-role.png)\n8. Return to the Profile Editor and click the **Mappings** edit button for the Databricks provisioning app user profile.\n9. For **Databricks to Okta**, map `appuser.primary_role` in the Databricks column to `primary_role` in the Okta column.\n10. For **Okta to Databricks**, map `user.primary_role` in the Databricks column to `primary_role` in the Okta column.\n11. Click **Save Mappings**.\n12. To add a role attribute value to a user, go to **Directory > People**, select a user, and go to the **Profile** tab in the user page. \nClick the **Edit** button to enter a Primary role value for the user. When you assign the user to the app, the role is populated with the value that you entered. \nRepeat this procedure to assign additional roles.\n\n", "chunk_id": "1e01d159bfc99dcb3dae0d9ce590205d", "url": "https://docs.databricks.com/admin/users-groups/scim/okta.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage users\n### service principals\n#### and groups\n##### Sync users and groups from your identity provider\n####### Configure SCIM provisioning for Okta\n######## Limitations\n\n* Removing a user from the workspace-level Okta application deactivates the user in Databricks, rather than deleting the user. You must [delete the user from Databricks directly](https://docs.databricks.com/admin/users-groups/scim/okta.html#delete-a-deactivated-user).\n* You can reactivate a deactivated user by removing then re-adding them to Okta, with the exact same email address.\n\n####### Configure SCIM provisioning for Okta\n######## Troubleshooting and tips\n\n* Users without either First Name or Last Name in their Databricks profiles cannot be imported to Okta as new users.\n* Users who existed in Databricks prior to provisioning setup: \n+ Are automatically linked to an Okta user if they already exist in Okta and are matched based on email address (username).\n+ Can be manually linked to an existing user or created as a new user in Okta if they are not automatically matched.\n* User permissions that are assigned individually and duplicated through membership in a group remain after the group membership is removed for the user.\n* Users removed from a Databricks workspace lose access to that workspace, but they might still have access to other Databricks workspaces.\n* The `admins` group is a reserved group in Databricks and cannot be removed.\n* You cannot rename groups in Databricks; do not attempt to rename them in Okta.\n* You can use the Databricks [Groups API](https://docs.databricks.com/api/workspace/groups) or the [Groups UI](https://docs.databricks.com/admin/users-groups/groups.html) to get a list of members of any Databricks workspace-level group.\n* You cannot update Databricks usernames and email addresses.\n\n", "chunk_id": "fe824168c641ad408545dfc987a10507", "url": "https://docs.databricks.com/admin/users-groups/scim/okta.html"} +{"chunked_text": "# Databricks documentation archive\n### Koalas\n\nImportant \nThis documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See [Pandas API on Spark](https://docs.databricks.com/pandas/pandas-on-spark.html). \nNote \nKoalas is deprecated. If you try using Koalas on clusters that run [Databricks Runtime 10.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/10.0.html) and above, an informational message displays, recommending that you use [Pandas API on Spark](https://docs.databricks.com/pandas/pandas-on-spark.html) instead. \n[Koalas](https://koalas.readthedocs.io/en/latest/index.html) provides a drop-in replacement for [pandas](https://pandas.pydata.org). Commonly used by data scientists, pandas is a Python package that provides easy-to-use data structures and data analysis tools for the Python programming language. However, pandas does not scale out to big data. Koalas fills this gap by providing pandas equivalent APIs that work on Apache Spark. Koalas is useful not only for pandas users but also PySpark users, because Koalas supports many tasks that are difficult to do with PySpark, for example plotting data directly from a PySpark DataFrame.\n\n", "chunk_id": "b40c94753585943633356cf076fec5d8", "url": "https://docs.databricks.com/archive/legacy/koalas.html"} +{"chunked_text": "# Databricks documentation archive\n### Koalas\n#### Requirements\n\n* Koalas is included on clusters running Databricks Runtime 7.3 through 9.1. For clusters running Databricks Runtime 10.0 and above, use [Pandas API on Spark](https://docs.databricks.com/pandas/pandas-on-spark.html) instead.\n* To use Koalas on a cluster running Databricks Runtime 7.0 or below, install Koalas as a Databricks [PyPI library](https://docs.databricks.com/libraries/package-repositories.html#pypi-libraries).\n* To use Koalas in an IDE, notebook server, or other custom applications that connect to a Databricks cluster, install [Databricks Connect](https://docs.databricks.com/dev-tools/databricks-connect/index.html) and follow the [Koalas installation instructions](https://koalas.readthedocs.io/en/latest/getting_started/install.html).\n\n### Koalas\n#### Notebook\n\nThe following notebook shows how to migrate from pandas to Koalas. \n### pandas to Koalas notebook \n[Open notebook in new tab](https://docs.databricks.com/_extras/notebooks/source/pandas-to-koalas-in-10-minutes.html)\n![Copy to clipboard](https://docs.databricks.com/_static/clippy.svg) Copy link for import\n\n### Koalas\n#### Resources\n\n* [Koalas documentation](https://koalas.readthedocs.io/en/latest/index.html)\n* [10 Minutes from pandas to Koalas on Apache Spark](https://databricks.com/blog/2020/03/31/10-minutes-from-pandas-to-koalas-on-apache-spark.html)\n\n", "chunk_id": "5cf84d7fe1418e7f695dd818d91fe8b8", "url": "https://docs.databricks.com/archive/legacy/koalas.html"} +{"chunked_text": "# Query data\n### Data format options\n\nDatabricks has built-in keyword bindings for all of the data formats natively supported by Apache Spark. Databricks uses Delta Lake as the default protocol for reading and writing data and tables, whereas Apache Spark uses Parquet. \nThese articles provide an overview of many of the options and configurations available when you query data on Databricks. \nThe following data formats have built-in keyword configurations in Apache Spark DataFrames and SQL: \n* [Delta Lake](https://docs.databricks.com/delta/index.html)\n* [Delta Sharing](https://docs.databricks.com/query/formats/deltasharing.html)\n* [Parquet](https://docs.databricks.com/query/formats/parquet.html)\n* [ORC](https://docs.databricks.com/query/formats/orc.html)\n* [JSON](https://docs.databricks.com/query/formats/json.html)\n* [CSV](https://docs.databricks.com/query/formats/csv.html)\n* [Avro](https://docs.databricks.com/query/formats/avro.html)\n* [Text](https://docs.databricks.com/query/formats/text.html)\n* [Binary](https://docs.databricks.com/query/formats/binary.html)\n* [XML](https://docs.databricks.com/query/formats/xml.html) \nDatabricks also provides a custom keyword for loading [MLflow experiments](https://docs.databricks.com/query/formats/mlflow-experiment.html).\n\n", "chunk_id": "a00ae1a74e8e4115bfcc48084d1283dc", "url": "https://docs.databricks.com/query/formats/index.html"} +{"chunked_text": "# Query data\n### Data format options\n#### Data formats with special considerations\n\nSome data formats require additional configuration or special considerations for use: \n* Databricks recommends loading [images](https://docs.databricks.com/query/formats/image.html) as `binary` data.\n* [Hive tables](https://docs.databricks.com/query/formats/hive-tables.html) are natively supported by Apache Spark, but require configuration on Databricks.\n* Databricks can directly read compressed files in many file formats. You can also [unzip compressed files](https://docs.databricks.com/files/unzip-files.html) on Databricks if necessary.\n* [LZO](https://docs.databricks.com/query/formats/lzo.html) requires a codec installation. \nFor more information about Apache Spark data sources, see [Generic Load/Save Functions](https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html) and [Generic File Source Options](https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html).\n\n", "chunk_id": "c7a6e5baf72587272d8212f53a865d68", "url": "https://docs.databricks.com/query/formats/index.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n", "chunk_id": "4af60a88300ab7cb9d12529c7bfc2a57", "url": "https://docs.databricks.com/security/network/classic/customer-managed-vpc.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Configure a customer-managed VPC\n###### Overview\n\nBy default, clusters are created in a single AWS VPC (Virtual Private Cloud) that Databricks creates and configures in your AWS account. You can optionally create your Databricks workspaces in your own VPC, a feature known as *customer-managed VPC*. You can use a customer-managed VPC to exercise more control over your network configurations to comply with specific cloud security and governance standards your organization may require. To configure your workspace to use [AWS PrivateLink](https://docs.databricks.com/security/network/classic/privatelink.html) for any type of connection, your workspace must use a customer-managed VPC. \nA customer-managed VPC is good solution if you have: \n* Security policies that prevent PaaS providers from creating VPCs in your own AWS account.\n* An approval process to create a new VPC, in which the VPC is configured and secured in a well-documented way by internal information security or cloud engineering teams. \nBenefits include: \n* **Lower privilege level**: You maintain more control of your own AWS account. And you don\u2019t need to grant Databricks as many permissions via cross-account IAM role as you do for a Databricks-managed VPC. For example, there is no need for permission to create VPCs. This limited set of permissions can make it easier to get approval to use Databricks in your platform stack.\n* **Simplified network operations**: Better network space utilization. Optionally configure smaller subnets for a workspace, compared to the default CIDR /16. And there is no need for the complex VPC peering configurations that might be necessary with other solutions.\n* **Consolidation of VPCs**: Multiple Databricks workspaces can share a single classic compute plane VPC, which is often preferred for billing and instance management.\n* **Limit outgoing connections**: By default, the classic compute plane does not limit outgoing connections from Databricks Runtime workers. For workspaces that are configured to use a customer-managed VPC, you can use an egress firewall or proxy appliance to limit outbound traffic to a list of allowed internal or external data sources. \n![Customer-managed VPC](https://docs.databricks.com/_images/customer-managed-vpc.png) \nTo take advantage of a customer-managed VPC, you must specify a VPC when you first create the Databricks workspace. You cannot move an existing workspace with a Databricks-managed VPC to use a customer-managed VPC. You can, however, move an existing workspace with a customer-managed VPC from one VPC to another VPC by updating the workspace configuration\u2019s network configuration object. See [Update a running or failed workspace](https://docs.databricks.com/admin/workspace/update-workspace.html). \nTo deploy a workspace in your own VPC, you must: \n1. Create the VPC following the requirements enumerated in [VPC requirements](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#vpc-requirements).\n2. Reference your VPC network configuration with Databricks when you create the workspace. \n* [Use the account console](https://docs.databricks.com/admin/workspace/create-workspace.html) and choose the configuration by name\n* [Use the Account API](https://docs.databricks.com/admin/workspace/create-workspace-api.html) and choose the configuration by its IDYou must provide the VPC ID, subnet IDs, and security group ID when you register the VPC with Databricks.\n\n", "chunk_id": "7eec3048e7bd8b6a28915bd4afedf5c3", "url": "https://docs.databricks.com/security/network/classic/customer-managed-vpc.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Configure a customer-managed VPC\n###### VPC requirements\n\nYour VPC must meet the requirements described in this section in order to host a Databricks workspace. \nRequirements: \n* [VPC region](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#vpc-region)\n* [VPC sizing](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#vpc-sizing)\n* [VPC IP address ranges](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#vpc-ip-address-ranges)\n* [DNS](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#dns)\n* [Subnets](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#subnets)\n* [Security groups](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#security-groups)\n* [Subnet-level network ACLs](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#subnet-level-network-acls)\n* [AWS PrivateLink support](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#aws-privatelink-support) \n### [VPC region](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#id1) \nFor a list of AWS regions that support customer-managed VPC, see [Databricks clouds and regions](https://docs.databricks.com/resources/supported-regions.html). \n### [VPC sizing](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#id2) \nYou can share one VPC with multiple workspaces in a single AWS account. However, Databricks recommends using unique subnets and security groups for each workspace. Be sure to size your VPC and subnets accordingly. Databricks assigns two IP addresses per node, one for management traffic and one for Apache Spark applications. The total number of instances for each subnet is equal to half the number of IP addresses that are available. Learn more in [Subnets](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#subnet). \n### [VPC IP address ranges](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#id3) \nDatabricks doesn\u2019t limit netmasks for the workspace VPC, but each workspace subnet must have a netmask between `/17` and `/26`. This means that if your workspace has two subnets and both have a netmask of `/26`, then the netmask for your workspace VPC must be `/25` or smaller. \nImportant \nIf you have configured secondary CIDR blocks for your VPC, make sure that the subnets for the Databricks workspace are configured with the same VPC CIDR block. \n### [DNS](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#id4) \nThe VPC must have DNS hostnames and DNS resolution enabled. \n### [Subnets](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#id5) \nDatabricks must have access to at least *two subnets for each workspace*, with each subnet in a different availability zone. You cannot specify more than one Databricks workspace subnet per Availability Zone in the [Create network configuration API call](https://docs.databricks.com/api/account/networks/create). You can have more than one subnet per availability zone as part of your network setup, but you can choose only one subnet per Availability Zone for the Databricks workspace. \nYou can choose to share one subnet across multiple workspaces or both subnets across workspaces. For example, you can have two workspaces that share the same VPC. One workspace can use subnets `A` and `B` and another workspaces can use subnets `A` and `C`. If you plan to share subnets across multiple workspaces, be sure to size your VPC and subnets to be large enough to scale with usage. \nDatabricks assigns two IP addresses per node, one for management traffic and one for Spark applications. The total number of instances for each subnet is equal to half of the number of IP addresses that are available. \nEach subnet must have a netmask between `/17` and `/26`. \n#### Additional subnet requirements \n* Subnets must be private.\n* Subnets must have outbound access to the public network using a [NAT gateway](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html) and [internet gateway](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html), or other similar customer-managed appliance infrastructure.\n* The NAT gateway must be set up [in its own subnet](https://aws.amazon.com/premiumsupport/knowledge-center/nat-gateway-vpc-private-subnet/) that routes quad-zero (`0.0.0.0/0`) traffic to an [internet gateway](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html) or other customer-managed appliance infrastructure. \nImportant \nWorkspaces must have outbound access from the VPC to the public network. If you configure IP access lists, those public networks must be added to an allow list. See [Configure IP access lists for workspaces](https://docs.databricks.com/security/network/front-end/ip-access-list-workspace.html). \n#### Subnet route table \nThe route table for workspace subnets must have quad-zero (`0.0.0.0/0`) traffic that targets the appropriate network device. Quad-zero traffic must target a NAT Gateway or your own managed NAT device or proxy appliance. \nImportant \nDatabricks requires subnets to add `0.0.0.0/0` to your allow list. This rule must be prioritized. To control egress traffic, use an egress firewall or proxy appliance to block most traffic but allow the URLs that Databricks needs to connect to. See [Configure a firewall and outbound access](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#firewall). \nThis is a base guideline only. Your configuration requirements may differ. For questions, contact your Databricks account team. \n### [Security groups](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#id6) \nA Databricks workspace must have access to at least one AWS security group and no more than five security groups. You can reuse existing security groups rather than create new ones. However, Databricks recommends using unique subnets and security groups for each workspace. \nSecurity groups must have the following rules: \n**Egress (outbound):** \n* Allow all TCP and UDP access to the workspace security group (for internal traffic)\n* Allow TCP access to `0.0.0.0/0` for these ports: \n+ 443: for Databricks infrastructure, cloud data sources, and library repositories\n+ 3306: for the metastore\n+ 6666: for secure cluster connectivity. This is only required if you use [PrivateLink](https://docs.databricks.com/security/network/classic/privatelink.html).\n+ 2443: Supports FIPS encryption. Only required if you enable the [compliance security profile](https://docs.databricks.com/security/privacy/security-profile.html).\n+ 8443 through 8451: Future extendability. Ensure these [ports are open by January 31, 2024](https://docs.databricks.com/release-notes/product/2023/august.html#aws-new-egress-ports). \n**Ingress (inbound):** Required for all workspaces (these can be separate rules or combined into one): \n* Allow TCP on all ports when traffic source uses the same security group\n* Allow UDP on all ports when traffic source uses the same security group \n### [Subnet-level network ACLs](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#id7) \nSubnet-level network ACLs must not deny ingress or egress to any traffic. Databricks validates for the following rules while creating the workspace: \n**Egress (outbound):** \n* Allow all traffic to the workspace VPC CIDR, for internal traffic \n+ Allow TCP access to `0.0.0.0/0` for these ports: \n- 443: for Databricks infrastructure, cloud data sources, and library repositories\n- 3306: for the metastore\n- 6666: only required if you use [PrivateLink](https://docs.databricks.com/security/network/classic/privatelink.html) \nImportant \nIf you configure additional `ALLOW` or `DENY` rules for outbound traffic, set the rules required by Databricks to the highest priority (the lowest rule numbers), so that they take precedence. \n**Ingress (inbound):** \n* `ALLOW ALL from Source 0.0.0.0/0`. This rule must be prioritized. \nNote \nDatabricks requires subnet-level network ACLs to add `0.0.0.0/0` to your allow list. To control egress traffic, use an egress firewall or proxy appliance to block most traffic but allow the URLs that Databricks needs to connect to. See [Configure a firewall and outbound access](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#firewall). \n### [AWS PrivateLink support](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#id8) \nIf you plan to enabled AWS PrivateLink on the workspace with this VPC: \n* On the VPC, ensure that you enable both of the settings **DNS Hostnames** and **DNS resolution**.\n* Review the article [Enable AWS PrivateLink](https://docs.databricks.com/security/network/classic/privatelink.html) for guidance about creating an extra subnet for VPC endpoints (recommended but not required) and creating an extra security group for VPC endpoints.\n\n", "chunk_id": "55f8e6855771cceefce050ce669f6a60", "url": "https://docs.databricks.com/security/network/classic/customer-managed-vpc.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Configure a customer-managed VPC\n###### Create a VPC\n\nTo create VPCs you can use various tools: \n* AWS console\n* AWS CLI\n* [Terraform](https://docs.databricks.com/dev-tools/terraform/index.html)\n* [AWS Quickstart](https://docs.databricks.com/admin/workspace/templates.html) (create a new customer-managed VPC and a new workspace) \nTo use AWS Console, the basic instructions for creating and configuring a VPC and related objects are listed below. For complete instructions, see the AWS documentation. \nNote \nThese basic instructions might not apply to all organizations. Your configuration requirements may differ. This section does not cover all possible ways to configure NATs, firewalls, or other network infrastructure. If you have questions, contact your Databricks account team before proceeding. \n1. Go to the [VPCs page in AWS](https://console.aws.amazon.com/vpc/#vpcs:).\n2. See the region picker in the upper-right. If needed, switch to the region for your workspace.\n3. In the upper-right corner, click the orange button **Create VPC**. \n![create new VPC editor](https://docs.databricks.com/_images/customer-managed-vpc-createnew.png)\n4. Click **VPC and more**.\n5. In the **Name tag auto-generation** type a name for your workspace. Databricks recommends including the region in the name.\n6. For VPC address range, optionally change it if desired.\n7. For public subnets, click `2`. Those subnets aren\u2019t used directly by your Databricks workspace, but they are required to enable NATs in this editor.\n8. For private subnets, click `2` for the minimum for workspace subnets. You can add more if desired. \nYour Databricks workspace needs at least two private subnets. To resize them, click **Customize subnet CIDR blocks**.\n9. For NAT gateways, click **In 1 AZ**.\n10. Ensure the following fields at the bottom are enabled: **Enable DNS hostnames** and **Enable DNS resolution**.\n11. Click **Create VPC**.\n12. When viewing your new VPC, click on the left navigation items to update related settings on the VPC. To make it easier to find related objects, in the **Filter by VPC** field, select your new VPC.\n13. Click **Subnets** and what AWS calls the **private** subnets labeled 1 and 2, which are the ones you will use to configure your main workspace subnets. Modify the subnets as specified in [VPC requirements](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#vpc-requirements). \nIf you created an extra private subnet for use with PrivateLink, configure private subnet 3 as specified in [Enable AWS PrivateLink](https://docs.databricks.com/security/network/classic/privatelink.html).\n14. Click **Security groups** and modify the security group as specified in [Security groups](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#security-groups). \nIf you will use back-end PrivateLink connectivity, create an additional security group with inbound and outbound rules as specified in the PrivateLink article in the section [Step 1: Configure AWS network objects](https://docs.databricks.com/security/network/classic/privatelink.html#create-vpc).\n15. Click **Network ACLs** and modify the network ACLs as specified in [Subnet-level network ACLs](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#network-acls).\n16. Choose whether to perform the optional configurations that are specified later in this article.\n17. Register your VPC with Databricks to create a network configuration [using the account console](https://docs.databricks.com/admin/account-settings-e2/networks.html) or by [using the Account API](https://docs.databricks.com/admin/workspace/create-workspace-api.html).\n\n", "chunk_id": "db2ab8a83c2de0f96abfd7d95456489f", "url": "https://docs.databricks.com/security/network/classic/customer-managed-vpc.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Configure a customer-managed VPC\n###### Updating CIDRs\n\nYou might need to, at a later time, update subnet CIDRs that overlap with original subnets. \nTo update the CIDRs and other workspace objects: \n1. Terminate all running clusters (and other compute resources) that are running in the subnets that need to be updated.\n2. Using the AWS console, delete the subnets to update.\n3. Re-create the subnets with updated CIDR ranges.\n4. Update the route table association for the two new subnets. You can reuse the ones in each availability zone for existing subnets. \nImportant \nIf you skip this step or misconfigure the route tables, cluster may fail to launch.\n5. Create a new network configuration object with the new subnets.\n6. Update the workspace to use this newly created network configuration object\n\n", "chunk_id": "1c5f274f1572c74bb1ccdb6cbdfc4370", "url": "https://docs.databricks.com/security/network/classic/customer-managed-vpc.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Configure a customer-managed VPC\n###### (Recommended) Configure regional endpoints\n\nIf you use a customer-managed VPC (optional), Databricks recommends you configure your VPC to use only regional VPC endpoints to AWS services. Using regional VPC endpoints enables more direct connections to AWS services and reduced cost compared to AWS global endpoints. There are four AWS services that a Databricks workspace with a customer-managed VPC must reach: STS, S3, Kinesis, and RDS. \nThe connection from your VPC to the RDS service is required only if you use the default Databricks legacy Hive metastore and does not apply to Unity Catalog metastores. Although there is no VPC endpoint for RDS, instead of using the default Databricks legacy Hive metastore, you can configure your own external metastore. You can implement an external metastore with a [Hive metastore](https://docs.databricks.com/archive/external-metastores/external-hive-metastore.html) or [AWS Glue](https://docs.databricks.com/archive/external-metastores/aws-glue-metastore.html). \nFor the other three services, you can create VPC gateway or interface endpoints such that the relevant in-region traffic from clusters could transit over the secure AWS backbone rather than the public network: \n* **S3**: Create a [VPC gateway endpoint](https://aws.amazon.com/blogs/aws/new-vpc-endpoint-for-amazon-s3) that is directly accessible from your Databricks cluster subnets. This causes workspace traffic to all in-region S3 buckets to use the endpoint route. To access any cross-region buckets, open up access to S3 global URL `s3.amazonaws.com` in your egress appliance, or route `0.0.0.0/0` to an AWS internet gateway. \nTo use [DBFS mounts](https://docs.databricks.com/dbfs/mounts.html) with regional endpoints enabled: \n+ You must set up an environment variable in the cluster configuration to set `AWS_REGION=`. For example, if your workspace is deployed in the N. Virginia region, set `AWS_REGION=us-east-1`. To enforce it for all clusters, use [cluster policies](https://docs.databricks.com/admin/clusters/policies.html).\n* **STS**: Create a [VPC interface endpoint](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html#create-interface-endpoint) directly accessible from your Databricks cluster subnets. You can create this endpoint in your workspace subnets. Databricks recommends that you use the same security group that was created for your workspace VPC. This configuration causes workspace traffic to STS to use the endpoint route.\n* **Kinesis**: Create a [VPC interface endpoint](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html#create-interface-endpoint) directly accessible from your Databricks cluster subnets. You can create this endpoint in your workspace subnets. Databricks recommends that you use the same security group that was created for your workspace VPC. This configuration causes workspace traffic to Kinesis to use the endpoint route. The only exception to this rule is workspaces in the AWS region `us-west-1` because target Kinesis streams in this region are cross-region to the `us-west-2` region.\n\n", "chunk_id": "1f5730f08576cedca9b79267920921a3", "url": "https://docs.databricks.com/security/network/classic/customer-managed-vpc.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Configure a customer-managed VPC\n###### Configure a firewall and outbound access\n\nYou must use an egress firewall or proxy appliance to block most traffic but allow the URLs that Databricks needs to connect to: \n* If the firewall or proxy appliance is in the same VPC as the Databricks workspace VPC, route the traffic and configure it to allow the following connections.\n* If the firewall or proxy appliance is in a different VPC or an on-premises network, route `0.0.0.0/0` to that VPC or network first and configure the proxy appliance to allow the following connections. \nImportant \nDatabricks strongly recommends that you specify destinations as domain names in your egress infrastructure, rather than as IP addresses. \nAllow the following outgoing connections. For each connection type, follow the link to get IP addresses or domains for your workspace region. \n* **Databricks web application**:Required. Also used for REST API calls to your workspace. \n[Databricks control plane addresses](https://docs.databricks.com/resources/supported-regions.html#control-plane-ip-addresses)\n* **Databricks secure cluster connectivity (SCC) relay**: Required for secure cluster connectivity. \n[Databricks control plane addresses](https://docs.databricks.com/resources/supported-regions.html#control-plane-ip-addresses)\n* **AWS S3 global URL**:Required by Databricks to access the root S3 bucket. Use `s3.amazonaws.com:443`, regardless of region.\n* **AWS S3 regional URL**:Optional. If you use S3 buckets that might be in other regions, you must also allow the S3 regional endpoint. Although AWS provides a domain and port for a regional endpoint (`s3..amazonaws.com:443`), Databricks recommends that you instead use a [VPC endpoint](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#regional-endpoints) so that this traffic goes through the private tunnel over the AWS network backbone. See [(Recommended) Configure regional endpoints](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#regional-endpoints).\n* **AWS STS global URL**:Required. Use the following address and port, regardless of region: `sts.amazonaws.com:443`\n* **AWS STS regional URL**:Required due to expected switch to regional endpoint. Use a VPC endpoint. See [(Recommended) Configure regional endpoints](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#regional-endpoints).\n* **AWS Kinesis regional URL**:Required. The Kinesis endpoint is used to capture logs needed to manage and monitor the software. For the URL for your region, see [Kinesis addresses](https://docs.databricks.com/resources/supported-regions.html#kinesis).\n* **Table metastore RDS regional URL (by compute plane region)**:Required if your Databricks workspace uses the default Hive metastore. \nThe Hive metastore is always in the same region as your compute plane, but it might be in a different region than the control plane. \n[RDS addresses for legacy Hive metastore](https://docs.databricks.com/resources/supported-regions.html#rds) \nNote \nInstead of using the default Hive metastore, you can choose to [implement your own table metastore instance](https://docs.databricks.com/archive/external-metastores/index.html), in which case you are responsible for its network routing.\n* **Control plane infrastructure**: Required. Used by Databricks for standby Databricks infrastructure to improve the stability of Databricks services. \n[Databricks control plane addresses](https://docs.databricks.com/resources/supported-regions.html#control-plane-ip-addresses) \n### Troubleshoot regional endpoints \nIf you followed the instructions above and the VPC endpoints do not work as intended, for example, if your data sources are inaccessible or if the traffic is bypassing the endpoints, you can use one of two approaches to add support for the regional endpoints for S3 and STS instead of using VPC endpoints. \n1. Add the environment variable `AWS_REGION` in the cluster configuration and set it to your AWS region. To enable it for all clusters, use [cluster policies](https://docs.databricks.com/admin/clusters/policies.html). You might have already configured this environment variable to use DBFS mounts.\n2. Add the required Apache Spark configuration. Do exactly one of the following approaches: \n* **In each source notebook**: \n```\n%scala\nspark.conf.set(\"fs.s3a.stsAssumeRole.stsEndpoint\", \"https://sts..amazonaws.com\")\nspark.conf.set(\"fs.s3a.endpoint\", \"https://s3..amazonaws.com\")\n\n``` \n```\n%python\nspark.conf.set(\"fs.s3a.stsAssumeRole.stsEndpoint\", \"https://sts..amazonaws.com\")\nspark.conf.set(\"fs.s3a.endpoint\", \"https://s3..amazonaws.com\")\n\n```\n* *Alternatively, in the Apache Spark config for the cluster*\\*: \n```\nspark.hadoop.fs.s3a.endpoint https://s3..amazonaws.com\nspark.hadoop.fs.s3a.stsAssumeRole.stsEndpoint https://sts..amazonaws.com\n\n```\n3. If you limit egress from the classic compute plane using a firewall or internet appliance, add these regional endpoint addresses to your allow list. \nTo set these values for all clusters, configure the values as part of your [cluster policy](https://docs.databricks.com/admin/clusters/policies.html).\n\n", "chunk_id": "5cb215222c56e6baf1dae7cab049aba5", "url": "https://docs.databricks.com/security/network/classic/customer-managed-vpc.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Configure a customer-managed VPC\n###### (Optional) Access S3 using instance profiles\n\nTo access S3 mounts using [instance profiles](https://docs.databricks.com/connect/storage/tutorial-s3-instance-profile.html), set the following Spark configurations: \n* Either **in each source notebook**: \n```\n%scala\nspark.conf.set(\"fs.s3a.stsAssumeRole.stsEndpoint\", \"https://sts..amazonaws.com\")\nspark.conf.set(\"fs.s3a.endpoint\", \"https://s3..amazonaws.com\")\n\n``` \n```\n%python\nspark.conf.set(\"fs.s3a.stsAssumeRole.stsEndpoint\", \"https://sts..amazonaws.com\")\nspark.conf.set(\"fs.s3a.endpoint\", \"https://s3..amazonaws.com\")\n\n```\n* Or **in the Apache Spark config for the cluster**: \n```\nspark.hadoop.fs.s3a.endpoint https://s3..amazonaws.com\nspark.hadoop.fs.s3a.stsAssumeRole.stsEndpoint https://sts..amazonaws.com\n\n``` \nTo set these values for all clusters, configure the values as part of your [cluster policy](https://docs.databricks.com/admin/clusters/policies.html). \nWarning \nFor the S3 service, there are limitations to applying additional regional endpoint configurations at the notebook or cluster level. Notably, access to cross-region S3 access is blocked, even if the global S3 URL is allowed in your egress firewall or proxy. If your Databricks deployment might require cross-region S3 access, it is important that you not apply the Spark configuration at the notebook or cluster level.\n\n", "chunk_id": "4eaa24bed2515318b3b070db1937fd25", "url": "https://docs.databricks.com/security/network/classic/customer-managed-vpc.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Configure a customer-managed VPC\n###### (Optional) Restrict access to S3 buckets\n\nMost reads from and writes to S3 are self-contained within the compute plane. However, some management operations originate from the control plane, which is managed by Databricks. To limit access to S3 buckets to a specified set of source IP addresses, create an S3 bucket policy. In the bucket policy, include the IP addresses in the `aws:SourceIp` list. If you use a VPC Endpoint, allow access to it by adding it to the policy\u2019s `aws:sourceVpce`. Databricks uses VPC IDs for accessing S3 buckets in the same region as the Databricks control plane, and NAT IPs for accessing S3 buckets in different regions from the control plane. \nFor more information about S3 bucket policies, see the [bucket policy examples](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-bucket-policies.html#example-bucket-policies-use-case-3) in the Amazon S3 documentation. Working [example bucket policies](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#example-bucket-policies) are also included in this topic. \n### Requirements for bucket policies \nYour bucket policy must meet these requirements, to ensure that your clusters start correctly and that you can connect to them: \n* You must allow access from the [control plane NAT IP and VPC IDs for your region](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#required-ips-and-storage-buckets).\n* You must allow access from the compute plane VPC, by doing one of the following: \n+ (Recommended) Configure a gateway VPC Endpoint in your [Customer-managed VPC](https://docs.databricks.com/admin/cloud-configurations/aws/customer-managed-vpc.html) and adding it to the `aws:sourceVpce` to the bucket policy, or\n+ Add the compute plane NAT IP to the `aws:SourceIp` list.\n* **When using [Endpoint policies for Amazon S3](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html#vpc-endpoints-policies-s3)**, your policy must include: \n+ Your workspace\u2019s [root storage bucket](https://docs.databricks.com/admin/account-settings-e2/storage.html).\n+ The required [artifact, log, system tables, and shared datasets bucket for your region](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#required-ips-and-storage-buckets).\n* **To avoid losing connectivity from within your corporate network**, Databricks recommends always allowing access from at least one known and trusted IP address, such as the public IP of your corporate VPN. This is because Deny conditions apply even within the AWS console. \nNote \nWhen deploying a new workspace with S3 bucket policy restrictions, you must allow access to the control plane NAT-IP for a `us-west` region, otherwise the deployment fails. After the workspace is deployed, you can remove the `us-west` info and update the control plane NAT-IP to reflect your region. \n### Required IPs and storage buckets \nFor the IP addresses and domains that you need for configuring S3 bucket policies and VPC Endpoint policies to restrict access to your workspace\u2019s S3 buckets, see [Outbound from Databricks control plane](https://docs.databricks.com/resources/supported-regions.html#outbound). \n### Example bucket policies \nThese examples use placeholder text to indicate where to specify recommended IP addresses and required storage buckets. Review the [requirements](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#requirements-for-bucket-policies) to ensure that your clusters start correctly and that you can connect to them. \n**Restrict access to the Databricks control plane, compute plane, and trusted IPs:** \nThis S3 bucket policy uses a Deny condition to selectively allow access from the control plane, NAT gateway, and corporate VPN IP addresses you specify. Replace the placeholder text with values for your environment. You can add any number of IP addresses to the policy. Create one policy per S3 bucket you want to protect. \nImportant \nIf you use VPC Endpoints, this policy is not complete. See [Restrict access to the Databricks control plane, VPC endpoints, and trusted IPs](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html#example-bucket-policy-vpce). \n```\n{\n\"Sid\": \"IPDeny\",\n\"Effect\": \"Deny\",\n\"Principal\": \"*\",\n\"Action\": \"s3:*\",\n\"Resource\": [\n\"arn:aws:s3:::\",\n\"arn:aws:s3:::/*\"\n],\n\"Condition\": {\n\"NotIpAddress\": {\n\"aws:SourceIp\": [\n\"\",\n\"\",\n\"\"\n]\n}\n}\n}\n\n``` \n**Restrict access to the Databricks control plane, VPC endpoints, and trusted IPs:** \nIf you use a VPC Endpoint to access S3, you must add a second condition to the policy. This condition allows access from your VPC Endpoint and VPC ID by adding it to the `aws:sourceVpce` list. \nThis bucket selectively allows access from your VPC Endpoint, and from the control plane and corporate VPN IP addresses you specify. \nWhen using VPC Endpoints, you can use a VPC Endpoint policy instead of an S3 bucket policy. A VPCE policy must allow access to your root S3 bucket and to the required artifact, log, and shared datasets bucket for your region. For the IP addresses and domains for your regions, see [IP addresses and domains](https://docs.databricks.com/resources/supported-regions.html#ip-domain-aws). \nReplace the placeholder text with values for your environment. \n```\n{\n\"Sid\": \"IPDeny\",\n\"Effect\": \"Deny\",\n\"Principal\": \"*\",\n\"Action\": \"s3:*\",\n\"Resource\": [\n\"arn:aws:s3:::\",\n\"arn:aws:s3:::/*\"\n],\n\"Condition\": {\n\"NotIpAddressIfExists\": {\n\"aws:SourceIp\": [\n\"\",\n\"\"\n]\n},\n\"StringNotEqualsIfExists\": {\n\"aws:sourceVpce\": \"\",\n\"aws:SourceVPC\": \"\"\n}\n}\n}\n\n```\n\n", "chunk_id": "767e532a0770a32f75343d9013e9da1d", "url": "https://docs.databricks.com/security/network/classic/customer-managed-vpc.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### Window functions\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nFunctions that operate on a group of rows, referred to as a window, and calculate a return value for each row based on the group of rows.\nWindow functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row.\n\n#### Window functions\n##### Syntax\n\n```\nfunction OVER { window_name | ( window_name ) | window_spec }\n\nfunction\n{ ranking_function | analytic_function | aggregate_function }\n\nover_clause\nOVER { window_name | ( window_name ) | window_spec }\n\nwindow_spec\n( [ PARTITION BY partition [ , ... ] ] [ order_by ] [ window_frame ] )\n\n```\n\n", "chunk_id": "32bb657aee2b1a9273d24ce1311ba36a", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### Window functions\n##### Parameters\n\n* **function** \nThe function operating on the window. Different classes of functions support different configurations of window specifications. \n+ ranking\\_function \nAny of the [Ranking window functions](https://docs.databricks.com/sql/language-manual/sql-ref-functions-builtin.html#ranking-window-functions). \nIf specified the window\\_spec must include an [ORDER BY clause](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-orderby.html), but not a window\\_frame clause.\n+ analytic\\_function \nAny of the [Analytic window functions](https://docs.databricks.com/sql/language-manual/sql-ref-functions-builtin.html#analytic-window-functions).\n+ aggregate\\_function \nAny of the [Aggregate functions](https://docs.databricks.com/sql/language-manual/sql-ref-functions-builtin.html#aggregate-functions). \nIf specified the function must not include a FILTER clause.\n* **window\\_name** \nIdentifies a [named window](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-named-window.html) specification defined by the [query](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-query.html).\n* **window\\_spec** \nThis clause defines how the rows will be grouped, sorted within the group, and which rows within a partition a function operates on. \n+ partition \nOne or more expression used to specify a group of rows defining the scope on which the function operates.\nIf no PARTITION clause is specified the partition is comprised of all rows.\n+ order\\_by \nThe [ORDER BY clause](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-orderby.html) specifies the order of rows within a partition.\n+ window\\_frame \nThe [window frame clause](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-window-functions-frame.html) specifies a sliding subset of rows within the partition on which the aggregate or analytics function operates. \nYou can specify SORT BY as an alias for ORDER BY. \nYou can also specify DISTRIBUTE BY as an alias for PARTITION BY.\nYou can use CLUSTER BY as an alias for PARTITION BY in the absence of ORDER BY.\n\n", "chunk_id": "3ef190f91cdfb79356677d934efe4c15", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### Window functions\n##### Examples\n\n```\n> CREATE TABLE employees\n(name STRING, dept STRING, salary INT, age INT);\n> INSERT INTO employees\nVALUES ('Lisa', 'Sales', 10000, 35),\n('Evan', 'Sales', 32000, 38),\n('Fred', 'Engineering', 21000, 28),\n('Alex', 'Sales', 30000, 33),\n('Tom', 'Engineering', 23000, 33),\n('Jane', 'Marketing', 29000, 28),\n('Jeff', 'Marketing', 35000, 38),\n('Paul', 'Engineering', 29000, 23),\n('Chloe', 'Engineering', 23000, 25);\n\n> SELECT name, dept, salary, age FROM employees;\nChloe Engineering 23000 25\nFred Engineering 21000 28\nPaul Engineering 29000 23\nHelen Marketing 29000 40\nTom Engineering 23000 33\nJane Marketing 29000 28\nJeff Marketing 35000 38\nEvan Sales 32000 38\nLisa Sales 10000 35\nAlex Sales 30000 33\n\n> SELECT name,\ndept,\nRANK() OVER (PARTITION BY dept ORDER BY salary) AS rank\nFROM employees;\nLisa Sales 10000 1\nAlex Sales 30000 2\nEvan Sales 32000 3\nFred Engineering 21000 1\nTom Engineering 23000 2\nChloe Engineering 23000 2\nPaul Engineering 29000 4\nHelen Marketing 29000 1\nJane Marketing 29000 1\nJeff Marketing 35000 3\n\n> SELECT name,\ndept,\nDENSE_RANK() OVER (PARTITION BY dept ORDER BY salary\nROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS dense_rank\nFROM employees;\nLisa Sales 10000 1\nAlex Sales 30000 2\nEvan Sales 32000 3\nFred Engineering 21000 1\nTom Engineering 23000 2\nChloe Engineering 23000 2\nPaul Engineering 29000 3\nHelen Marketing 29000 1\nJane Marketing 29000 1\nJeff Marketing 35000 2\n\n> SELECT name,\ndept,\nage,\nCUME_DIST() OVER (PARTITION BY dept ORDER BY age\nRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cume_dist\nFROM employees;\nAlex Sales 33 0.3333333333333333\nLisa Sales 35 0.6666666666666666\nEvan Sales 38 1.0\nPaul Engineering 23 0.25\nChloe Engineering 25 0.50\nFred Engineering 28 0.75\nTom Engineering 33 1.0\nJane Marketing 28 0.3333333333333333\nJeff Marketing 38 0.6666666666666666\nHelen Marketing 40 1.0\n\n> SELECT name,\ndept,\nsalary,\nMIN(salary) OVER (PARTITION BY dept ORDER BY salary) AS min\nFROM employees;\nLisa Sales 10000 10000\nAlex Sales 30000 10000\nEvan Sales 32000 10000\nHelen Marketing 29000 29000\nJane Marketing 29000 29000\nJeff Marketing 35000 29000\nFred Engineering 21000 21000\nTom Engineering 23000 21000\nChloe Engineering 23000 21000\nPaul Engineering 29000 21000\n\n> SELECT name,\nsalary,\nLAG(salary) OVER (PARTITION BY dept ORDER BY salary) AS lag,\nLEAD(salary, 1, 0) OVER (PARTITION BY dept ORDER BY salary) AS lead\nFROM employees;\nLisa Sales 10000 NULL 30000\nAlex Sales 30000 10000 32000\nEvan Sales 32000 30000 0\nFred Engineering 21000 NULL 23000\nChloe Engineering 23000 21000 23000\nTom Engineering 23000 23000 29000\nPaul Engineering 29000 23000 0\nHelen Marketing 29000 NULL 29000\nJane Marketing 29000 29000 35000\nJeff Marketing 35000 29000 0\n\n```\n\n", "chunk_id": "c530ee3fba83799863142b50fc32c627", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### Window functions\n##### Related articles\n\n* [SELECT](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select.html)\n* [ORDER BY](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-orderby.html)\n* [window frame clause](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-window-functions-frame.html)\n* [named window](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-named-window.html)\n* [query](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-query.html)\n* [Aggregate functions](https://docs.databricks.com/sql/language-manual/sql-ref-functions-builtin.html#aggregate-functions)\n* [Analytic window functions](https://docs.databricks.com/sql/language-manual/sql-ref-functions-builtin.html#analytic-window-functions)\n* [Ranking window functions](https://docs.databricks.com/sql/language-manual/sql-ref-functions-builtin.html#ranking-window-functions)\n\n", "chunk_id": "863e6fec896ae7e94ed997f99f8b4acf", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Query\n##### SORT BY clause\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns the result rows sorted within each partition in the user specified order. When there is more\nthan one partition `SORT BY` may return result that is partially ordered. This is different than\n[ORDER BY](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-orderby.html) clause which guarantees a total order of the\noutput.\n\n##### SORT BY clause\n###### Syntax\n\n```\nSORT BY { expression [ sort_direction nulls_sort_oder ] } [, ...]\n\nsort_direction\n[ ASC | DEC ]\n\nnulls_sort_order\n[ NULLS FIRST | NULLS LAST ]\n\n```\n\n", "chunk_id": "6339fd291ee6a162adc70f6201f4c2c9", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-sortby.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Query\n##### SORT BY clause\n###### Parameters\n\n* **[expression](https://docs.databricks.com/sql/language-manual/sql-ref-expression.html)** \nAn expression of any type used to establish a partition local order in which results are returned. \nIf the expression is a literal INT value it is interpreted as a column position in the select list.\n* **sort\\_direction** \nSpecifies the sort order for the sort by expression. \n+ `ASC`: The sort direction for this expression is ascending.\n+ `DESC`: The sort order for this expression is descending.If sort direction is not explicitly specified, then by default rows are sorted ascending.\n* **nulls\\_sort\\_order** \nOptionally specifies whether NULL values are returned before/after non-NULL values. If\n`null_sort_order` is not specified, then NULLs sort first if sort order is\n`ASC` and NULLS sort last if sort order is `DESC`. \n+ `NULLS FIRST`: NULL values are returned first regardless of the sort order.\n+ `NULLS LAST`: NULL values are returned last regardless of the sort order. \nWhen specifying more than one expression sorting occurs left to right.\nAll rows within the partition are sorted by the first expression.\nIf there are duplicate values for the first expression the second expression is used to resolve order within the group of duplicates and so on.\nThe resulting order not deterministic if there are duplicate values across all order by expressions. \n### Examples \n```\n> CREATE TEMP VIEW person (zip_code, name, age)\nAS VALUES (94588, 'Zen Hui', 50),\n(94588, 'Dan Li', 18),\n(94588, 'Anil K', 27),\n(94588, 'John V', NULL),\n(94511, 'David K', 42),\n(94511, 'Aryan B.', 18),\n(94511, 'Lalit B.', NULL);\n\n-- Use `REPARTITION` hint to partition the data by `zip_code` to\n-- examine the `SORT BY` behavior. This is used in rest of the\n-- examples.\n\n-- Sort rows by `name` within each partition in ascending manner\n> SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person\nSORT BY name;\nAnil K 27 94588\nDan Li 18 94588\nJohn V NULL 94588\nZen Hui 50 94588\nAryan B. 18 94511\nDavid K 42 94511\nLalit B. NULL 94511\n\n-- Sort rows within each partition using column position.\n> SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person\nSORT BY 1;\nAnil K 27 94588\nDan Li 18 94588\nJohn V null 94588\nZen Hui 50 94588\nAryan B. 18 94511\nDavid K 42 94511\nLalit B. null 94511\n\n-- Sort rows within partition in ascending manner keeping null values to be last.\n> SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person\nSORT BY age NULLS LAST;\n18 Dan Li 94588\n27 Anil K 94588\n50 Zen Hui 94588\nNULL John V 94588\n18 Aryan B. 94511\n42 David K 94511\nNULL Lalit B. 94511\n\n-- Sort rows by age within each partition in descending manner, which defaults to NULL LAST.\n> SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person\nSORT BY age DESC;\n50 Zen Hui 94588\n27 Anil K 94588\n18 Dan Li 94588\nNULL John V 94588\n42 David K 94511\n18 Aryan B. 94511\nNULL Lalit B. 94511\n\n-- Sort rows by age within each partition in descending manner keeping null values to be first.\n> SELECT /*+ REPARTITION(zip_code) */ age, name, zip_code FROM person\nSORT BY age DESC NULLS FIRST;\nNULL John V 94588\n50 Zen Hui 94588\n27 Anil K 94588\n18 Dan Li 94588\nNULL Lalit B. 94511\n42 David K 94511\n18 Aryan B. 94511\n\n-- Sort rows within each partition based on more than one column with each column having\n-- different sort direction.\n> SELECT /*+ REPARTITION(zip_code) */ name, age, zip_code FROM person\nSORT BY name ASC, age DESC;\nAnil K 27 94588\nDan Li 18 94588\nJohn V null 94588\nZen Hui 50 94588\nAryan B. 18 94511\nDavid K 42 94511\nLalit B. null 94511\n\n```\n\n", "chunk_id": "e64b4ad707ff08d1d9ad57a83d862463", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-sortby.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Query\n##### SORT BY clause\n###### Related articles\n\n* [Query](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-query.html)\n\n", "chunk_id": "f1fc48d6364ad7e651491059fd76986e", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-sortby.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### CREATE FUNCTION (External)\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nCreates a temporary or permanent external function. Temporary functions are scoped at a session level where\nas permanent functions are created in the persistent catalog and are made available to all sessions.\nThe resources specified in the `USING` clause are made available to all executors when they are\nexecuted for the first time. \nIn addition to the SQL interface, Spark allows you to create custom user defined scalar and aggregate functions using Scala, Python, and Java APIs. See [External user-defined scalar functions (UDFs)](https://docs.databricks.com/sql/language-manual/sql-ref-functions-udf-scalar.html) and [User-defined aggregate functions (UDAFs)](https://docs.databricks.com/sql/language-manual/sql-ref-functions-udf-aggregate.html) for more information.\n\n#### CREATE FUNCTION (External)\n##### Syntax\n\n```\nCREATE [ OR REPLACE ] [ TEMPORARY ] FUNCTION [ IF NOT EXISTS ]\nfunction_name AS class_name [ resource_locations ]\n\n```\n\n", "chunk_id": "f1d78acaf4b25c890d5efb1afa6a496e", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-function.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### CREATE FUNCTION (External)\n##### Parameters\n\n* **OR REPLACE** \nIf specified, the resources for the function are reloaded. This is mainly useful\nto pick up any changes made to the implementation of the function. This\nparameter is mutually exclusive to `IF NOT EXISTS` and cannot\nbe specified together.\n* **TEMPORARY** \nIndicates the scope of function being created. When `TEMPORARY` is specified, the\ncreated function is valid and visible in the current session. No persistent\nentry is made in the catalog for these kind of functions.\n* **IF NOT EXISTS** \nIf specified, creates the function only when it does not exist. The creation\nof function succeeds (no error is thrown) if the specified function already\nexists in the system. This parameter is mutually exclusive to `OR REPLACE`\nand cannot be specified together.\n* **[function\\_name](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#function-name)** \nA name for the function. The function name may be optionally qualified with a schema name. \nFunctions created in `hive_metastore` can only contain alphanumeric ASCII characters and underscores.\n* **class\\_name** \nThe name of the class that provides the implementation for function to be created.\nThe implementing class should extend one of the base classes as follows: \n+ Should extend `UDF` or `UDAF` in `org.apache.hadoop.hive.ql.exec` package.\n+ Should extend `AbstractGenericUDAFResolver`, `GenericUDF`, or\n`GenericUDTF` in `org.apache.hadoop.hive.ql.udf.generic` package.\n+ Should extend `UserDefinedAggregateFunction` in `org.apache.spark.sql.expressions` package.\n* **resource\\_locations** \nThe list of resources that contain the implementation of the function\nalong with its dependencies. \n**Syntax:** `USING { { (JAR | FILE | ARCHIVE) resource_uri } , ... }`\n\n", "chunk_id": "0ca491eddc560bf18c8c9aa9f5fd2023", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-function.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### CREATE FUNCTION (External)\n##### Examples\n\n```\n-- 1. Create a simple UDF `SimpleUdf` that increments the supplied integral value by 10.\n-- import org.apache.hadoop.hive.ql.exec.UDF;\n-- public class SimpleUdf extends UDF {\n-- public int evaluate(int value) {\n-- return value + 10;\n-- }\n-- }\n-- 2. Compile and place it in a JAR file called `SimpleUdf.jar` in /tmp.\n\n-- Create a table called `test` and insert two rows.\n> CREATE TABLE test(c1 INT);\n> INSERT INTO test VALUES (1), (2);\n\n-- Create a permanent function called `simple_udf`.\n> CREATE FUNCTION simple_udf AS 'SimpleUdf'\nUSING JAR '/tmp/SimpleUdf.jar';\n\n-- Verify that the function is in the registry.\n> SHOW USER FUNCTIONS;\nfunction\n------------------\ndefault.simple_udf\n\n-- Invoke the function. Every selected value should be incremented by 10.\n> SELECT simple_udf(c1) AS function_return_value FROM t1;\nfunction_return_value\n---------------------\n11\n12\n\n-- Created a temporary function.\n> CREATE TEMPORARY FUNCTION simple_temp_udf AS 'SimpleUdf'\nUSING JAR '/tmp/SimpleUdf.jar';\n\n-- Verify that the newly created temporary function is in the registry.\n-- The temporary function does not have a qualified\n-- schema associated with it.\n> SHOW USER FUNCTIONS;\nfunction\n------------------\ndefault.simple_udf\nsimple_temp_udf\n\n-- 1. Modify `SimpleUdf`'s implementation to add supplied integral value by 20.\n-- import org.apache.hadoop.hive.ql.exec.UDF;\n\n-- public class SimpleUdfR extends UDF {\n-- public int evaluate(int value) {\n-- return value + 20;\n-- }\n-- }\n-- 2. Compile and place it in a jar file called `SimpleUdfR.jar` in /tmp.\n\n-- Replace the implementation of `simple_udf`\n> CREATE OR REPLACE FUNCTION simple_udf AS 'SimpleUdfR'\nUSING JAR '/tmp/SimpleUdfR.jar';\n\n-- Invoke the function. Every selected value should be incremented by 20.\n> SELECT simple_udf(c1) AS function_return_value FROM t1;\nfunction_return_value\n---------------------\n21\n22\n\n```\n\n", "chunk_id": "fdce1951153d31e10c861022f911c518", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-function.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### CREATE FUNCTION (External)\n##### Related articles\n\n* [CREATE FUNCTION (SQL and Python)](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html)\n* [SHOW FUNCTIONS](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-show-functions.html)\n* [DESCRIBE FUNCTION](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-describe-function.html)\n* [DROP FUNCTION](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-drop-function.html)\n\n", "chunk_id": "c22a176afb1282035c6075e8744b8a50", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-function.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `+` (plus sign) unary operator\n\nReturns the value of `expr`. This function is a synonym for [positive function](https://docs.databricks.com/sql/language-manual/functions/positive.html).\n\n####### `+` (plus sign) unary operator\n######## Syntax\n\n```\n+ expr\n\n```\n\n####### `+` (plus sign) unary operator\n######## Arguments\n\n* `expr`: An expression that evaluates to a numeric or INTERVAL.\n\n####### `+` (plus sign) unary operator\n######## Returns\n\nThe result type matches the argument. \nThis function is a no-op.\n\n####### `+` (plus sign) unary operator\n######## Examples\n\n```\n> SELECT +(1);\n1\n\n> SELECT +(-1);\n-1\n\n> SELECT +INTERVAL '5' MONTH;\n0-5\n\n```\n\n####### `+` (plus sign) unary operator\n######## Related functions\n\n* [negative function](https://docs.databricks.com/sql/language-manual/functions/negative.html)\n* [abs function](https://docs.databricks.com/sql/language-manual/functions/abs.html)\n* [sign function](https://docs.databricks.com/sql/language-manual/functions/sign.html)\n* [- (minus sign) unary operator](https://docs.databricks.com/sql/language-manual/functions/minussignunary.html)\n* [positive function](https://docs.databricks.com/sql/language-manual/functions/positive.html)\n\n", "chunk_id": "4d34efde44efde9f7094247b8a1678e9", "url": "https://docs.databricks.com/sql/language-manual/functions/plussignunary.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `string` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nCasts the value `expr` to `STRING`. This function is a synonym for `cast(expr AS STRING)`. See [cast function](https://docs.databricks.com/sql/language-manual/functions/cast.html) for details.\n\n####### `string` function\n######## Syntax\n\n```\nstring(expr)\n\n```\n\n####### `string` function\n######## Arguments\n\n* `expr`: An expression that can be cast to `STRING`.\n\n####### `string` function\n######## Returns\n\nThe result matches the type of `expr`.\n\n####### `string` function\n######## Examples\n\n```\n> SELECT string(5);\n5\n> SELECT string(current_date);\n2021-04-01\n\n```\n\n####### `string` function\n######## Related functions\n\n* [cast function](https://docs.databricks.com/sql/language-manual/functions/cast.html)\n\n", "chunk_id": "0d5de6840ffafba19ab8e8bef7e612eb", "url": "https://docs.databricks.com/sql/language-manual/functions/string.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Maintenance updates for Databricks Runtime (archived)\n\nThis archived page lists maintenance updates issued for Databricks Runtime releases that are no longer supported. To add a maintenance update to an existing cluster, restart the cluster. \nTo migrate to a supported Databricks Runtime version, see the [Databricks Runtime migration guide](https://dbrmg.databricks.com/?cloud_provider=aws). \nImportant \nThis documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See [Databricks Runtime release notes versions and compatibility](https://docs.databricks.com/release-notes/runtime/index.html). \nNote \nThis article contains references to the term *whitelist*, a term that Databricks does not use. When the term is removed from the software, we\u2019ll remove it from this article.\n\n", "chunk_id": "0d0adf6f31e2b12fbebc6ae6852ae7f8", "url": "https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Maintenance updates for Databricks Runtime (archived)\n##### Databricks Runtime releases\n\nMaintenance updates by release: \n* [Databricks Runtime 14.0](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-140)\n* [Databricks Runtime 13.1](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-131)\n* [Databricks Runtime 12.2 LTS](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-122-lts)\n* [Databricks Runtime 11.3 LTS](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-113-lts)\n* [Databricks Runtime 10.4 LTS](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-104-lts)\n* [Databricks Runtime 9.1 LTS](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-91-lts)\n* [Databricks Runtime 13.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-130-unsupported)\n* [Databricks Runtime 12.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-121-unsupported)\n* [Databricks Runtime 12.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-120-unsupported)\n* [Databricks Runtime 11.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-112-unsupported)\n* [Databricks Runtime 11.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-111-unsupported)\n* [Databricks Runtime 11.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-110-unsupported)\n* [Databricks Runtime 10.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-105-unsupported)\n* [Databricks Runtime 10.3 (Unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-103-unsupported)\n* [Databricks Runtime 10.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-102-unsupported)\n* [Databricks Runtime 10.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-101-unsupported)\n* [Databricks Runtime 10.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-100-unsupported)\n* [Databricks Runtime 9.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-90-unsupported)\n* [Databricks Runtime 8.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-84-unsupported)\n* [Databricks Runtime 8.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-83-unsupported)\n* [Databricks Runtime 8.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-82-unsupported)\n* [Databricks Runtime 8.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-81-unsupported)\n* [Databricks Runtime 8.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-80-unsupported)\n* [Databricks Runtime 7.6 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-76-unsupported)\n* [Databricks Runtime 7.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-75-unsupported)\n* [Databricks Runtime 7.3 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-73-lts-unsupported)\n* [Databricks Runtime 6.4 Extended Support (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-64-extended-support-unsupported)\n* [Databricks Runtime 5.5 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-55-lts-unsupported)\n* [Databricks Light 2.4 Extended Support](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-light-24-extended-support)\n* [Databricks Runtime 7.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-74-unsupported)\n* [Databricks Runtime 7.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-72-unsupported)\n* [Databricks Runtime 7.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-71-unsupported)\n* [Databricks Runtime 7.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-70-unsupported)\n* [Databricks Runtime 6.6 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-66-unsupported)\n* [Databricks Runtime 6.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-65-unsupported)\n* [Databricks Runtime 6.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-63-unsupported)\n* [Databricks Runtime 6.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-62-unsupported)\n* [Databricks Runtime 6.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-61-unsupported)\n* [Databricks Runtime 6.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-60-unsupported)\n* [Databricks Runtime 5.4 ML (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-54-ml-unsupported)\n* [Databricks Runtime 5.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-54-unsupported)\n* [Databricks Runtime 5.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-53-unsupported)\n* [Databricks Runtime 5.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-52-unsupported)\n* [Databricks Runtime 5.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-51-unsupported)\n* [Databricks Runtime 5.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-50-unsupported)\n* [Databricks Runtime 4.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-43-unsupported)\n* [Databricks Runtime 4.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-42-unsupported)\n* [Databricks Runtime 4.1 ML (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-41-ml-unsupported)\n* [Databricks Runtime 4.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-41-unsupported)\n* [Databricks Runtime 4.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-40-unsupported)\n* [Databricks Runtime 3.5 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-35-lts-unsupported)\n* [Databricks Runtime 3.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-34-unsupported)\n* [Databricks Runtime 3.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-33-unsupported)\n* [Databricks Runtime 3.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#databricks-runtime-32-unsupported)\n* [2.1.1-db6 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#211-db6-unsupported) \nFor the maintenance updates on supported Databricks Runtime versions, see [Databricks Runtime maintenance updates](https://docs.databricks.com/release-notes/runtime/maintenance-updates.html). \n### [Databricks Runtime 14.0](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id1) \nSee [Databricks Runtime 14.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/14.0.html). \n* February 8, 2024 \n+ [[SPARK-46396]](https://issues.apache.org/jira/browse/SPARK-46396) Timestamp inference should not throw exception.\n+ [[SPARK-46794]](https://issues.apache.org/jira/browse/SPARK-46794) Remove subqueries from LogicalRDD constraints.\n+ [[SPARK-45182]](https://issues.apache.org/jira/browse/SPARK-45182) Ignore task completion from old stage after retrying parent-indeterminate stage as determined by checksum.\n+ [[SPARK-46933]](https://issues.apache.org/jira/browse/SPARK-46933) Add query execution time metric to connectors which use JDBCRDD.\n+ [[SPARK-45957]](https://issues.apache.org/jira/browse/SPARK-45957) Avoid generating execution plan for non-executable commands.\n+ [[SPARK-46861]](https://issues.apache.org/jira/browse/SPARK-46861) Avoid Deadlock in DAGScheduler.\n+ [[SPARK-46930]](https://issues.apache.org/jira/browse/SPARK-46930) Add support for a custom prefix for Union type fields in Avro.\n+ [[SPARK-46941]](https://issues.apache.org/jira/browse/SPARK-46941) Can\u2019t insert window group limit node for top-k computation if contains SizeBasedWindowFunction.\n+ [[SPARK-45582]](https://issues.apache.org/jira/browse/SPARK-45582) Ensure that store instance is not used after calling commit within output mode streaming aggregation.\n+ Operating system security updates.\n* January 31, 2024 \n+ [[SPARK-46541]](https://issues.apache.org/jira/browse/SPARK-46541) Fix the ambiguous column reference in self join.\n+ [[SPARK-46676]](https://issues.apache.org/jira/browse/SPARK-46676) dropDuplicatesWithinWatermark should not fail on canonicalization of the plan.\n+ [[SPARK-46769]](https://issues.apache.org/jira/browse/SPARK-46769) Refine timestamp related schema inference.\n+ [[SPARK-45498]](https://issues.apache.org/jira/browse/SPARK-45498) Followup: Ignore task completion from old stage attempts.\n+ Revert [[SPARK-46769]](https://issues.apache.org/jira/browse/SPARK-46769) Refine timestamp related schema inference.\n+ [[SPARK-46383]](https://issues.apache.org/jira/browse/SPARK-46383) Reduce Driver Heap Usage by Reducing the Lifespan of `TaskInfo.accumulables()`.\n+ [[SPARK-46633]](https://issues.apache.org/jira/browse/SPARK-46633) Fix Avro reader to handle zero-length blocks.\n+ [[SPARK-46677]](https://issues.apache.org/jira/browse/SPARK-46677) Fix `dataframe[\"*\"]` resolution.\n+ [[SPARK-46684]](https://issues.apache.org/jira/browse/SPARK-46684) Fix CoGroup.applyInPandas/Arrow to pass arguments properly.\n+ [[SPARK-46763]](https://issues.apache.org/jira/browse/SPARK-46763) Fix assertion failure in ReplaceDeduplicateWithAggregate for duplicate attributes.\n+ [[SPARK-46610]](https://issues.apache.org/jira/browse/SPARK-46610) Create table should throw exception when no value for a key in options.\n+ Operating system security updates.\n* January 17, 2024 \n+ The `shuffle` node of the explain plan returned by a Photon query is updated to add the `causedBroadcastJoinBuildOOM=true` flag when an out-of-memory error occurs during a shuffle that is part of a broadcast join.\n+ To avoid increased latency when communicating over TLSv1.3, this maintenance release includes a patch to the JDK 8 installation to fix JDK bug JDK-8293562.\n+ [[SPARK-46394]](https://issues.apache.org/jira/browse/SPARK-46394) Fix spark.catalog.listDatabases() issues on schemas with special characters when `spark.sql.legacy.keepCommandOutputSchema` set to true.\n+ [[SPARK-46250]](https://issues.apache.org/jira/browse/SPARK-46250) Deflake test\\_parity\\_listener.\n+ [[SPARK-45814]](https://issues.apache.org/jira/browse/SPARK-45814) Make ArrowConverters.createEmptyArrowBatch call close() to avoid memory leak.\n+ [[SPARK-46173]](https://issues.apache.org/jira/browse/SPARK-46173) Skipping trimAll call during date parsing.\n+ [[SPARK-46484]](https://issues.apache.org/jira/browse/SPARK-46484) Make `resolveOperators` helper functions keep the plan id.\n+ [[SPARK-46466]](https://issues.apache.org/jira/browse/SPARK-46466) Vectorized parquet reader should never do rebase for timestamp ntz.\n+ [[SPARK-46056]](https://issues.apache.org/jira/browse/SPARK-46056) Fix Parquet vectorized read NPE with byteArrayDecimalType default value.\n+ [[SPARK-46058]](https://issues.apache.org/jira/browse/SPARK-46058) Add separate flag for privateKeyPassword.\n+ [[SPARK-46478]](https://issues.apache.org/jira/browse/SPARK-46478) Revert SPARK-43049 to use oracle varchar(255) for string.\n+ [[SPARK-46132]](https://issues.apache.org/jira/browse/SPARK-46132) Support key password for JKS keys for RPC SSL.\n+ [[SPARK-46417]](https://issues.apache.org/jira/browse/SPARK-46417) Do not fail when calling hive.getTable and throwException is false.\n+ [[SPARK-46261]](https://issues.apache.org/jira/browse/SPARK-46261) `DataFrame.withColumnsRenamed` should keep the dict/map ordering.\n+ [[SPARK-46370]](https://issues.apache.org/jira/browse/SPARK-46370) Fix bug when querying from table after changing column defaults.\n+ [[SPARK-46609]](https://issues.apache.org/jira/browse/SPARK-46609) Avoid exponential explosion in PartitioningPreservingUnaryExecNode.\n+ [[SPARK-46600]](https://issues.apache.org/jira/browse/SPARK-46600) Move shared code between SqlConf and SqlApiConf to SqlApiConfHelper.\n+ [[SPARK-46538]](https://issues.apache.org/jira/browse/SPARK-46538) Fix the ambiguous column reference issue in `ALSModel.transform`.\n+ [[SPARK-46337]](https://issues.apache.org/jira/browse/SPARK-46337) Make `CTESubstitution` retain the `PLAN_ID_TAG`.\n+ [[SPARK-46602]](https://issues.apache.org/jira/browse/SPARK-46602) Propagate `allowExisting` in view creation when the view/table does not exists.\n+ [[SPARK-46260]](https://issues.apache.org/jira/browse/SPARK-46260) `DataFrame.withColumnsRenamed` should respect the dict ordering.\n+ [[SPARK-46145]](https://issues.apache.org/jira/browse/SPARK-46145) spark.catalog.listTables does not throw exception when the table or view is not found.\n* December 14, 2023 \n+ Fixed an issue where escaped underscores in *getColumns* operations originating from JDBC or ODBC clients were handled incorrectly and interpreted as wildcards.\n+ [[SPARK-46255]](https://issues.apache.org/jira/browse/SPARK-46255) Support complex type -> string conversion.\n+ [[SPARK-46028]](https://issues.apache.org/jira/browse/SPARK-46028) Make `Column.__getitem__` accept input column.\n+ [[SPARK-45920]](https://issues.apache.org/jira/browse/SPARK-45920) group by ordinal should be idempotent.\n+ [[SPARK-45433]](https://issues.apache.org/jira/browse/SPARK-45433) Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat.\n+ [[SPARK-45509]](https://issues.apache.org/jira/browse/SPARK-45509) Fix df column reference behavior for Spark Connect.\n+ Operating system security updates.\n* November 29, 2023 \n+ Installed a new package, `pyarrow-hotfix` to remediate a PyArrow RCE vulnerability.\n+ Fixed an issue where escaped underscores in `getColumns` operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.\n+ When ingesting CSV data using Auto Loader or Streaming Tables, large CSV files are now splittable and can be processed in parallel during both schema inference and data processing.\n+ Spark-snowflake connector is upgraded to 2.12.0.\n+ [[SPARK-45859]](https://issues.apache.org/jira/browse/SPARK-45859) Made UDF objects in `ml.functions` lazy.\n+ Revert [[SPARK-45592]](https://issues.apache.org/jira/browse/SPARK-45592).\n+ [[SPARK-45892]](https://issues.apache.org/jira/browse/SPARK-45892) Refactor optimizer plan validation to decouple `validateSchemaOutput` and `validateExprIdUniqueness`.\n+ [[SPARK-45592]](https://issues.apache.org/jira/browse/SPARK-45592) Fixed correctness issue in AQE with `InMemoryTableScanExec`.\n+ [[SPARK-45620]](https://issues.apache.org/jira/browse/SPARK-45620) APIs related to Python UDF now use camelCase.\n+ [[SPARK-44784]](https://issues.apache.org/jira/browse/SPARK-44784) Made SBT testing hermetic.\n+ [[SPARK-45770]](https://issues.apache.org/jira/browse/SPARK-45770) Fixed column resolution with `DataFrameDropColumns` for `Dataframe.drop`.\n+ [[SPARK-45544]](https://issues.apache.org/jira/browse/SPARK-45544) Integrated SSL support into `TransportContext`.\n+ [[SPARK-45730]](https://issues.apache.org/jira/browse/SPARK-45730) Improved time constraints for `ReloadingX509TrustManagerSuite`.\n+ Operating system security updates.\n* November 10, 2023 \n+ Changed data feed queries on Unity Catalog Streaming Tables and Materialized Views to display error messages.\n+ [[SPARK-45545]](https://issues.apache.org/jira/browse/SPARK-45545) `SparkTransportConf` inherits `SSLOptions` upon creation.\n+ [[SPARK-45584]](https://issues.apache.org/jira/browse/SPARK-45584) Fixed subquery run failure with `TakeOrderedAndProjectExec`.\n+ [[SPARK-45427]](https://issues.apache.org/jira/browse/SPARK-45427) Added RPC SSL settings to `SSLOptions` and `SparkTransportConf`.\n+ [[SPARK-45541]](https://issues.apache.org/jira/browse/SPARK-45541) Added `SSLFactory`.\n+ [[SPARK-45430]](https://issues.apache.org/jira/browse/SPARK-45430) `FramelessOffsetWindowFunction` no longer fails when `IGNORE NULLS` and `offset > rowCount`.\n+ [[SPARK-45429]](https://issues.apache.org/jira/browse/SPARK-45429) Added helper classes for SSL RPC communication.\n+ [[SPARK-44219]](https://issues.apache.org/jira/browse/SPARK-44219) Added extra per-rule validations for optimization rewrites.\n+ [[SPARK-45543]](https://issues.apache.org/jira/browse/SPARK-45543) Fixed an issue where `InferWindowGroupLimit` generated an error if the other window functions haven\u2019t the same window frame as the rank-like functions.\n+ Operating system security updates.\n* October 23, 2023 \n+ [[SPARK-45426]](https://issues.apache.org/jira/browse/SPARK-45426) Added support for `ReloadingX509TrustManager`.\n+ [[SPARK-45396]](https://issues.apache.org/jira/browse/SPARK-45396) Added doc entry for `PySpark.ml.connect` module, and added `Evaluator` to `__all__` at `ml.connect`.\n+ [[SPARK-45256]](https://issues.apache.org/jira/browse/SPARK-45256) Fixed an issue where `DurationWriter` failed when writing more values than initial capacity.\n+ [[SPARK-45279]](https://issues.apache.org/jira/browse/SPARK-45279) Attached `plan_id` to all logical plans.\n+ [[SPARK-45250]](https://issues.apache.org/jira/browse/SPARK-45250) Added support for stage-level task resource profile for yarn clusters when dynamic allocation is turned off.\n+ [[SPARK-45182]](https://issues.apache.org/jira/browse/SPARK-45182) Added support for rolling back shuffle map stage so all stage tasks can be retried when the stage output is indeterminate.\n+ [[SPARK-45419]](https://issues.apache.org/jira/browse/SPARK-45419) Avoid reusing `rocksdb sst` files in a different `rocksdb` instance by removing file version map entries of larger versions.\n+ [[SPARK-45386]](https://issues.apache.org/jira/browse/SPARK-45386) Fixed an issue where `StorageLevel.NONE` would incorrectly return 0.\n+ Operating system security updates.\n* October 13, 2023 \n+ Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.\n+ The `array_insert` function is 1-based for positive and negative indexes, while before, it was 0-based for negative indexes. It now inserts a new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.\n+ Databricks no longer ignores corrupt files when a CSV schema inference with Auto Loader has enabled `ignoreCorruptFiles`.\n+ [[SPARK-45227]](https://issues.apache.org/jira/browse/SPARK-45227) Fixed a subtle thread-safety issue with `CoarseGrainedExecutorBackend`.\n+ [[SPARK-44658]](https://issues.apache.org/jira/browse/SPARK-44658) `ShuffleStatus.getMapStatus` should return `None` instead of `Some(null)`.\n+ [[SPARK-44910]](https://issues.apache.org/jira/browse/SPARK-44910) `Encoders.bean` does not support superclasses with generic type arguments.\n+ [[SPARK-45346]](https://issues.apache.org/jira/browse/SPARK-45346) Parquet schema inference respects case-sensitive flags when merging schema.\n+ Revert [[SPARK-42946]](https://issues.apache.org/jira/browse/SPARK-42946).\n+ [[SPARK-42205]](https://issues.apache.org/jira/browse/SPARK-42205) Updated the JSON protocol to remove Accumulables logging in a task or stage start events.\n+ [[SPARK-45360]](https://issues.apache.org/jira/browse/SPARK-45360) Spark session builder supports initialization from `SPARK_REMOTE`.\n+ [[SPARK-45316]](https://issues.apache.org/jira/browse/SPARK-45316) Add new parameters `ignoreCorruptFiles`/`ignoreMissingFiles` to `HadoopRDD` and `NewHadoopRDD`.\n+ [[SPARK-44909]](https://issues.apache.org/jira/browse/SPARK-44909) Skip running the torch distributor log streaming server when it is not available.\n+ [[SPARK-45084]](https://issues.apache.org/jira/browse/SPARK-45084) `StateOperatorProgress` now uses accurate shuffle partition number.\n+ [[SPARK-45371]](https://issues.apache.org/jira/browse/SPARK-45371) Fixed shading issues in the Spark Connect Scala Client.\n+ [[SPARK-45178]](https://issues.apache.org/jira/browse/SPARK-45178) Fallback to running a single batch for `Trigger.AvailableNow` with unsupported sources rather than using the wrapper.\n+ [[SPARK-44840]](https://issues.apache.org/jira/browse/SPARK-44840) Make `array_insert()` 1-based for negative indexes.\n+ [[SPARK-44551]](https://issues.apache.org/jira/browse/SPARK-44551) Edited comments to sync with OSS.\n+ [[SPARK-45078]](https://issues.apache.org/jira/browse/SPARK-45078) The `ArrayInsert` function now makes explicit casting when the element type does not equal the derived component type.\n+ [[SPARK-45339]](https://issues.apache.org/jira/browse/SPARK-45339) PySpark now logs retry errors.\n+ [[SPARK-45057]](https://issues.apache.org/jira/browse/SPARK-45057) Avoid acquiring read lock when `keepReadLock` is false.\n+ [[SPARK-44908]](https://issues.apache.org/jira/browse/SPARK-44908) Fixed cross-validator `foldCol` param functionality.\n+ Operating system security updates. \n### [Databricks Runtime 13.1](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id2) \nSee [Databricks Runtime 13.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/13.1.html). \n* November 29, 2023 \n+ Fixed an issue where escaped underscores in `getColumns` operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.\n+ [[SPARK-44846]](https://issues.apache.org/jira/browse/SPARK-44846) Removed complex grouping expressions after `RemoveRedundantAggregates`.\n+ [[SPARK-43802]](https://issues.apache.org/jira/browse/SPARK-43802) Fixed an issue where codegen for unhex and unbase64 expressions would fail.\n+ [[SPARK-43718]](https://issues.apache.org/jira/browse/SPARK-43718) Fixed nullability for keys in `USING` joins.\n+ Operating system security updates.\n* November 14, 2023 \n+ Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.\n+ Changed data feed queries on Unity Catalog Streaming Tables and Materialized Views to display error messages.\n+ [[SPARK-45584]](https://issues.apache.org/jira/browse/SPARK-45584) Fixed subquery run failure with `TakeOrderedAndProjectExec`.\n+ [[SPARK-45430]](https://issues.apache.org/jira/browse/SPARK-45430) `FramelessOffsetWindowFunction` no longer fails when `IGNORE NULLS` and `offset > rowCount`.\n+ [[SPARK-45543]](https://issues.apache.org/jira/browse/SPARK-45543) Fixed an issue where `InferWindowGroupLimit` caused an issue if the other window functions didn\u2019t have the same window frame as the rank-like functions.\n+ Operating system security updates.\n* October 24, 2023 \n+ [[SPARK-43799]](https://issues.apache.org/jira/browse/SPARK-43799) Added descriptor binary option to PySpark `Protobuf` API.\n+ Revert [[SPARK-42946]](https://issues.apache.org/jira/browse/SPARK-42946).\n+ [[SPARK-45346]](https://issues.apache.org/jira/browse/SPARK-45346) Parquet schema inference now respects case-sensitive flag when merging a schema.\n+ Operating system security updates.\n* October 13, 2023 \n+ Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.\n+ No longer ignoring corrupt files when `ignoreCorruptFiles` is enabled during CSV schema inference with Auto Loader.\n+ [[SPARK-44658]](https://issues.apache.org/jira/browse/SPARK-44658) `ShuffleStatus.getMapStatus` returns `None` instead of `Some(null)`.\n+ [[SPARK-45178]](https://issues.apache.org/jira/browse/SPARK-45178) Fallback to running a single batch for `Trigger.AvailableNow` with unsupported sources rather than using the wrapper.\n+ [[SPARK-42205]](https://issues.apache.org/jira/browse/SPARK-42205) Updated the JSON protocol to remove Accumulables logging in a task or stage start events.\n+ Operating system security updates.\n* September 12, 2023 \n+ [[SPARK-44718]](https://issues.apache.org/jira/browse/SPARK-44718) Match `ColumnVector` memory-mode config default to `OffHeapMemoryMode` config value.\n+ [SPARK-44878](https://issues.apache.org/jira/browse/SPARK-44878) Turned off strict limit for `RocksDB` write manager to avoid insertion exception on cache complete.\n+ Miscellaneous fixes.\n* August 30, 2023 \n+ [[SPARK-44871]](https://issues.apache.org/jira/browse/SPARK-44871) Fixed `percentile\\_disc behavior.\n+ [[SPARK-44714]](https://issues.apache.org/jira/browse/SPARK-44714) Ease restriction of LCA resolution regarding queries.\n+ [[SPARK-44245]](https://issues.apache.org/jira/browse/SPARK-44245) `PySpark.sql.dataframe sample()` doc tests are now illustrative-only.\n+ [[SPARK-44818]](https://issues.apache.org/jira/browse/SPARK-44818) Fixed race for pending task interrupt issued before `taskThread` is initialized.\n+ Operating system security updates.\n* August 15, 2023 \n+ [[SPARK-44485]](https://issues.apache.org/jira/browse/SPARK-44485) Optimized `TreeNode.generateTreeString`.\n+ [[SPARK-44643]](https://issues.apache.org/jira/browse/SPARK-44643) Fixed `Row.__repr__` when the row is empty.\n+ [[SPARK-44504]](https://issues.apache.org/jira/browse/SPARK-44504) Maintenance task now cleans up loaded providers on stop error.\n+ [[SPARK-44479]](https://issues.apache.org/jira/browse/SPARK-44479) Fixed `protobuf` conversion from an empty struct type.\n+ [[SPARK-44464]](https://issues.apache.org/jira/browse/SPARK-44464) Fixed `applyInPandasWithStatePythonRunner` to output rows that have `Null` as the first column value.\n+ Miscellaneous fixes.\n* July 27, 2023 \n+ Fixed an issue where `dbutils.fs.ls()` returned `INVALID_PARAMETER_VALUE.LOCATION_OVERLAP` when called for a storage location path which clashed with other external or managed storage location.\n+ [[SPARK-44199]](https://issues.apache.org/jira/browse/SPARK-44199) `CacheManager` no longer refreshes the `fileIndex` unnecessarily.\n+ [[SPARK-44448]](https://issues.apache.org/jira/browse/SPARK-44448) Fixed wrong results bug from `DenseRankLimitIterator` and `InferWindowGroupLimit`.\n+ Operating system security updates.\n* July 24, 2023 \n+ Revert [[SPARK-42323]](https://issues.apache.org/jira/browse/SPARK-42323).\n+ [[SPARK-41848]](https://issues.apache.org/jira/browse/SPARK-41848) Fixed task over-schedule issue with `TaskResourceProfile`.\n+ [[SPARK-44136]](https://issues.apache.org/jira/browse/SPARK-44136) Fixed an issue where `StateManager` would get materialized in an executor instead of the driver in `FlatMapGroupsWithStateExec`.\n+ [[SPARK-44337]](https://issues.apache.org/jira/browse/SPARK-44337) Fixed an issue where any field set to `Any.getDefaultInstance` caused parse errors.\n+ Operating system security updates.\n* June 27, 2023 \n+ Operating system security updates.\n* June 15, 2023 \n+ Photonized `approx_count_distinct`.\n+ JSON parser in `failOnUnknownFields` mode now drops the record in `DROPMALFORMED` mode and fails directly in `FAILFAST` mode.\n+ Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.\n+ The `PubSubRecord` attributes field is stored as JSON instead of the string from a Scala map for more straightforward serialization and deserialization.\n+ The `EXPLAIN EXTENDED` command now returns the result cache eligibility of the query.\n+ Improve the performance of incremental updates with `SHALLOW CLONE` Iceberg and Parquet.\n+ [[SPARK-43032]](https://issues.apache.org/jira/browse/SPARK-43032) Python SQM bug fix.\n+ [[SPARK-43404]](https://issues.apache.org/jira/browse/SPARK-43404)Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.\n+ [[SPARK-43340]](https://issues.apache.org/jira/browse/SPARK-43340) Handle missing stack-trace field in eventlogs.\n+ [[SPARK-43527]](https://issues.apache.org/jira/browse/SPARK-43527) Fixed `catalog.listCatalogs` in PySpark.\n+ [[SPARK-43541]](https://issues.apache.org/jira/browse/SPARK-43541) Propagate all `Project` tags in resolving of expressions and missing columns.\n+ [[SPARK-43300]](https://issues.apache.org/jira/browse/SPARK-43300) `NonFateSharingCache` wrapper for Guava Cache.\n+ [[SPARK-43378]](https://issues.apache.org/jira/browse/SPARK-43378) Properly close stream objects in `deserializeFromChunkedBuffer`.\n+ [[SPARK-42852]](https://issues.apache.org/jira/browse/SPARK-42852) Revert `NamedLambdaVariable` related changes from `EquivalentExpressions`.\n+ [[SPARK-43779]](https://issues.apache.org/jira/browse/SPARK-43779) `ParseToDate` now loads `EvalMode` in the main thread.\n+ [[SPARK-43413]](https://issues.apache.org/jira/browse/SPARK-43413) Fix `IN` subquery `ListQuery` nullability.\n+ [[SPARK-43889]](https://issues.apache.org/jira/browse/SPARK-43889) Add check for column name for `__dir__()` to filter out error-prone column names.\n+ [[SPARK-43043]](https://issues.apache.org/jira/browse/SPARK-43043) Improved the performance of `MapOutputTracker`.updateMapOutput\n+ [[SPARK-43522]](https://issues.apache.org/jira/browse/SPARK-43522) Fixed creating struct column name with index of array.\n+ [[SPARK-43457]](https://issues.apache.org/jira/browse/SPARK-43457) Augument user agent with OS, Python and Spark versions.\n+ [[SPARK-43286]](https://issues.apache.org/jira/browse/SPARK-43286) Updated `aes_encrypt` CBC mode to generate random IVs.\n+ [[SPARK-42851]](https://issues.apache.org/jira/browse/SPARK-42851) Guard `EquivalentExpressions.addExpr()` with `supportedExpression()`.\n+ Revert [[SPARK-43183]](https://issues.apache.org/jira/browse/SPARK-43183).\n+ Operating system security updates. \n### [Databricks Runtime 12.2 LTS](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id3) \nSee [Databricks Runtime 12.2 LTS](https://docs.databricks.com/release-notes/runtime/12.2lts.html). \n* November 29, 2023 \n+ Fixed an issue where escaped underscores in `getColumns` operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.\n+ [[SPARK-42205]](https://issues.apache.org/jira/browse/SPARK-42205) Removed logging accumulables in `Stage` and `Task` start events.\n+ [[SPARK-44846]](https://issues.apache.org/jira/browse/SPARK-44846) Removed complex grouping expressions after `RemoveRedundantAggregates`.\n+ [[SPARK-43718]](https://issues.apache.org/jira/browse/SPARK-43718) Fixed nullability for keys in `USING` joins.\n+ [[SPARK-45544]](https://issues.apache.org/jira/browse/SPARK-45544) Integrated SSL support into `TransportContext`.\n+ [[SPARK-43973]](https://issues.apache.org/jira/browse/SPARK-43973) Structured Streaming UI now displays failed queries correctly.\n+ [[SPARK-45730]](https://issues.apache.org/jira/browse/SPARK-45730) Improved time constraints for `ReloadingX509TrustManagerSuite`.\n+ [[SPARK-45859]](https://issues.apache.org/jira/browse/SPARK-45859) Made UDF objects in `ml.functions` lazy.\n+ Operating system security updates.\n* November 14, 2023 \n+ Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.\n+ [[SPARK-45545]](https://issues.apache.org/jira/browse/SPARK-45545) `SparkTransportConf` inherits `SSLOptions` upon creation.\n+ [[SPARK-45427]](https://issues.apache.org/jira/browse/SPARK-45427) Added RPC SSL settings to `SSLOptions` and `SparkTransportConf`.\n+ [[SPARK-45584]](https://issues.apache.org/jira/browse/SPARK-45584) Fixed subquery run failure with `TakeOrderedAndProjectExec`.\n+ [[SPARK-45541]](https://issues.apache.org/jira/browse/SPARK-45541) Added `SSLFactory`.\n+ [[SPARK-45430]](https://issues.apache.org/jira/browse/SPARK-45430) `FramelessOffsetWindowFunction` no longer fails when `IGNORE NULLS` and `offset > rowCount`.\n+ [[SPARK-45429]](https://issues.apache.org/jira/browse/SPARK-45429) Added helper classes for SSL RPC communication.\n+ Operating system security updates.\n* October 24, 2023 \n+ [[SPARK-45426]](https://issues.apache.org/jira/browse/SPARK-45426) Added support for `ReloadingX509TrustManager`.\n+ Miscellaneous fixes.\n* October 13, 2023 \n+ Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.\n+ [[SPARK-42553]](https://issues.apache.org/jira/browse/SPARK-42553) Ensure at least one time unit after interval.\n+ [[SPARK-45346]](https://issues.apache.org/jira/browse/SPARK-45346) Parquet schema inference respects case sensitive flag when merging schema.\n+ [[SPARK-45178]](https://issues.apache.org/jira/browse/SPARK-45178) Fallback to running a single batch for `Trigger.AvailableNow` with unsupported sources rather than using the wrapper.\n+ [[SPARK-45084]](https://issues.apache.org/jira/browse/SPARK-45084) `StateOperatorProgress` to use an accurate, adequate shuffle partition number.\n* September 12, 2023 \n+ [[SPARK-44873]](https://issues.apache.org/jira/browse/SPARK-44873) Added support for `alter view` with nested columns in the Hive client.\n+ [[SPARK-44718]](https://issues.apache.org/jira/browse/SPARK-44718) Match `ColumnVector` memory-mode config default to `OffHeapMemoryMode` config value.\n+ [[SPARK-43799]](https://issues.apache.org/jira/browse/SPARK-43799) Added descriptor binary option to PySpark `Protobuf` API.\n+ Miscellaneous fixes.\n* August 30, 2023 \n+ [[SPARK-44485]](https://issues.apache.org/jira/browse/SPARK-44485) Optimized `TreeNode.generateTreeString`.\n+ [[SPARK-44818]](https://issues.apache.org/jira/browse/SPARK-44818) Fixed race for pending task interrupt issued before `taskThread` is initialized.\n+ [11.3-13.0][[SPARK-44871]]) Fixed `percentile_disc` behavior.\n+ [[SPARK-44714]](https://issues.apache.org/jira/browse/SPARK-44714) Eased restriction of LCA resolution regarding queries.\n+ Operating system security updates.\n* August 15, 2023 \n+ [[SPARK-44504]](https://issues.apache.org/jira/browse/SPARK-44504) Maintenance task cleans up loaded providers on stop error.\n+ [[SPARK-44464]](https://issues.apache.org/jira/browse/SPARK-44464) Fixed `applyInPandasWithStatePythonRunner` to output rows that have `Null` as the first column value.\n+ Operating system security updates.\n* July 29, 2023 \n+ Fixed an issue where `dbutils.fs.ls()` returned `INVALID_PARAMETER_VALUE.LOCATION_OVERLAP` when called for a storage location path which clashed with other external or managed storage location.\n+ [[SPARK-44199]](https://issues.apache.org/jira/browse/SPARK-44199) `CacheManager` no longer refreshes the `fileIndex` unnecessarily.\n+ Operating system security updates.\n* July 24, 2023 \n+ [[SPARK-44337]](https://issues.apache.org/jira/browse/SPARK-44337) Fixed an issue where any field set to `Any.getDefaultInstance` caused parse errors.\n+ [[SPARK-44136]](https://issues.apache.org/jira/browse/SPARK-44136) Fixed an issue where `StateManager` would get materialized in an executor instead of the driver in `FlatMapGroupsWithStateExec`.\n+ Operating system security updates.\n* June 23, 2023 \n+ Operating system security updates.\n* June 15, 2023 \n+ Photonized `approx_count_distinct`.\n+ Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.\n+ [[SPARK-43779]](https://issues.apache.org/jira/browse/SPARK-43779) `ParseToDate` now loads `EvalMode` in the main thread.\n+ [[SPARK-43156]](https://issues.apache.org/jira/browse/SPARK-43156)[[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098) Extended scalar subquery count error test with `decorrelateInnerQuery` turned off.\n+ Operating system security updates.\n* June 2, 2023 \n+ The JSON parser in `failOnUnknownFields` mode drops a record in `DROPMALFORMED` mode and fails directly in `FAILFAST` mode.\n+ Improve the performance of incremental updates with `SHALLOW CLONE` Iceberg and Parquet.\n+ Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.\n+ [[SPARK-43404]](https://issues.apache.org/jira/browse/SPARK-43404) Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.\n+ [[SPARK-43413]](https://issues.apache.org/jira/browse/SPARK-43413)[11.3-13.0] Fixed `IN` subquery `ListQuery` nullability.\n+ [[SPARK-43522]](https://issues.apache.org/jira/browse/SPARK-43522) Fixed creating struct column name with index of array.\n+ [[SPARK-43541]](https://issues.apache.org/jira/browse/SPARK-43541) Propagate all `Project` tags in resolving of expressions and missing columns.\n+ [[SPARK-43527]](https://issues.apache.org/jira/browse/SPARK-43527) Fixed `catalog.listCatalogs` in PySpark.\n+ [[SPARK-43123]](https://issues.apache.org/jira/browse/SPARK-43123) Internal field metadata no longer leaks to catalogs.\n+ [[SPARK-43340]](https://issues.apache.org/jira/browse/SPARK-43340) Fixed missing stack trace field in eventlogs.\n+ [[SPARK-42444]](https://issues.apache.org/jira/browse/SPARK-42444) `DataFrame.drop` now handles duplicated columns correctly.\n+ [[SPARK-42937]](https://issues.apache.org/jira/browse/SPARK-42937) `PlanSubqueries` now sets `InSubqueryExec#shouldBroadcast` to true.\n+ [[SPARK-43286]](https://issues.apache.org/jira/browse/SPARK-43286) Updated `aes_encrypt` CBC mode to generate random IVs.\n+ [[SPARK-43378]](https://issues.apache.org/jira/browse/SPARK-43378) Properly close stream objects in `deserializeFromChunkedBuffer`.\n* May 17, 2023 \n+ Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.\n+ If an Avro file was read with just the `failOnUnknownFields\\` option or with Auto Loader in the `failOnNewColumns\\` schema evolution mode, columns that have different data types would be read as `null\\` instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the `rescuedDataColumn\\` option.\n+ Auto Loader now does the following.\n+ - Correctly reads and no longer rescues `Integer`, `Short`, and `Byte` types if one of these data types is provided, but the Avro file suggests one of the other two types.\n+ - Prevents reading interval types as date or time stamp types to avoid getting corrupt dates.\n+ - Prevents reading `Decimal` types with lower precision.\n+ [[SPARK-43172]](https://issues.apache.org/jira/browse/SPARK-43172) Exposes host and token from Spark connect client.\n+ [[SPARK-43293]](https://issues.apache.org/jira/browse/SPARK-43293) `__qualified_access_only` is ignored in normal columns.\n+ [[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098) Fixed correctness `COUNT` bug when scalar subquery is grouped by clause.\n+ [[SPARK-43085]](https://issues.apache.org/jira/browse/SPARK-43085) Support for column `DEFAULT` assignment for multi-part table names.\n+ [[SPARK-43190]](https://issues.apache.org/jira/browse/SPARK-43190) `ListQuery.childOutput` is now consistent with secondary output.\n+ [[SPARK-43192]](https://issues.apache.org/jira/browse/SPARK-43192) Removed user agent charset validation.\n+ Operating system security updates.\n* April 25, 2023 \n+ If a Parquet file was read with just the `failOnUnknownFields` option or with Auto Loader in the `failOnNewColumns` schema evolution mode, columns that had different data types would be read as `null` instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the `rescuedDataColumn` option.\n+ Auto Loader now correctly reads and no longer rescues `Integer`, `Short`, and `Byte` types if one of these data types is provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be saved even though they were readable.\n+ [[SPARK-43009]](https://issues.apache.org/jira/browse/SPARK-43009) Parameterized `sql()` with `Any` constants\n+ [[SPARK-42406]](https://issues.apache.org/jira/browse/SPARK-42406) Terminate Protobuf recursive fields by dropping the field\n+ [[SPARK-43038]](https://issues.apache.org/jira/browse/SPARK-43038) Support the CBC mode by `aes_encrypt()`/`aes_decrypt()`\n+ [[SPARK-42971]](https://issues.apache.org/jira/browse/SPARK-42971) Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event\n+ [[SPARK-43018]](https://issues.apache.org/jira/browse/SPARK-43018) Fix bug for INSERT commands with timestamp literals\n+ Operating system security updates.\n* April 11, 2023 \n+ Support legacy data source formats in the `SYNC` command.\n+ Fixes an issue in the %autoreload behavior in notebooks outside of a repo.\n+ Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.\n+ [[SPARK-42928]](https://issues.apache.org/jira/browse/SPARK-42928) Makes `resolvePersistentFunction` synchronized.\n+ [[SPARK-42936]](https://issues.apache.org/jira/browse/SPARK-42936) Fixes LCan issue when the clause can be resolved directly by its child aggregate.\n+ [[SPARK-42967]](https://issues.apache.org/jira/browse/SPARK-42967) Fixes `SparkListenerTaskStart.stageAttemptId` when a task starts after the stage is canceled.\n+ Operating system security updates.\n* March 29, 2023 \n+ Databricks SQL now supports specifying default values for columns of Delta Lake tables, either at table creation time or afterward. Subsequent `INSERT`, `UPDATE`, `DELETE`, and `MERGE` commands can refer to any column\u2019s default value using the explicit `DEFAULT` keyword. In addition, if any `INSERT` assignment has an explicit list of fewer columns than the target table, corresponding column default values are substituted for the remaining columns (or NULL if no default is specified). \nFor example: \n```\nCREATE TABLE t (first INT, second DATE DEFAULT CURRENT_DATE()) USING delta;\nINSERT INTO t VALUES (0, DEFAULT);\nINSERT INTO t VALUES (1, DEFAULT);\nSELECT first, second FROM t;\n\\> 0, 2023-03-28\n1, 2023-03-28z\n\n```\n+ Auto Loader now initiates at least one synchronous RocksDB log cleanup for `Trigger.AvailableNow` streams to check that the checkpoint can get regularly cleaned up for fast-running Auto Loader streams. This can cause some streams to take longer before they shut down, but it will save you storage costs and improve the Auto Loader experience in future runs.\n+ You can now modify a Delta table to add support to table features using `DeltaTable.addFeatureSupport(feature_name)`.\n+ [[SPARK-42794]](https://issues.apache.org/jira/browse/SPARK-42794) Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming\n+ [[SPARK-42521]](https://issues.apache.org/jira/browse/SPARK-42521) Add NULLs for INSERTs with user-specified lists of fewer columns than the target table\n+ [[SPARK-42702]](https://issues.apache.org/jira/browse/SPARK-42702)[[SPARK-42623]](https://issues.apache.org/jira/browse/SPARK-42623) Support parameterized query in subquery and CTE\n+ [[SPARK-42668]](https://issues.apache.org/jira/browse/SPARK-42668) Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop\n+ [[SPARK-42403]](https://issues.apache.org/jira/browse/SPARK-42403) JsonProtocol should handle null JSON strings\n* March 8, 2023 \n+ The error message \u201cFailure to initialize configuration\u201d has been improved to provide more context for the customer.\n+ There is a terminology change for adding features to a Delta table using the table property. The preferred syntax is now `'delta.feature.featureName'='supported'` instead of `'delta.feature.featureName'='enabled'`. For backward compatibility, using `'delta.feature.featureName'='enabled'` still works and will continue to work.\n+ Starting from this release, it is possible to create/replace a table with an additional table property `delta.ignoreProtocolDefaults` to ignore protocol-related Spark configs, which includes default reader and writer versions and table features supported by default.\n+ [[SPARK-42070]](https://issues.apache.org/jira/browse/SPARK-42070) Change the default value of the argument of the Mask function from -1 to NULL\n+ [[SPARK-41793]](https://issues.apache.org/jira/browse/SPARK-41793) Incorrect result for window frames defined by a range clause on significant decimals\n+ [[SPARK-42484]](https://issues.apache.org/jira/browse/SPARK-42484) UnsafeRowUtils better error message\n+ [[SPARK-42516]](https://issues.apache.org/jira/browse/SPARK-42516) Always capture the session time zone config while creating views\n+ [[SPARK-42635]](https://issues.apache.org/jira/browse/SPARK-42635) Fix the TimestampAdd expression.\n+ [[SPARK-42622]](https://issues.apache.org/jira/browse/SPARK-42622) Turned off substitution in values\n+ [[SPARK-42534]](https://issues.apache.org/jira/browse/SPARK-42534) Fix DB2Dialect Limit clause\n+ [[SPARK-42121]](https://issues.apache.org/jira/browse/SPARK-42121) Add built-in table-valued functions posexplode, posexplode\\_outer, json\\_tuple and stack\n+ [[SPARK-42045]](https://issues.apache.org/jira/browse/SPARK-42045) ANSI SQL mode: Round/Bround should return an error on tiny/small/significant integer overflow\n+ Operating system security updates. \n### [Databricks Runtime 11.3 LTS](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id4) \nSee [Databricks Runtime 11.3 LTS](https://docs.databricks.com/release-notes/runtime/11.3lts.html). \n* November 29, 2023 \n+ Fixed an issue where escaped underscores in `getColumns` operations originating from JDBC or ODBC clients were wrongly interpreted as wildcards.\n+ [[SPARK-43973]](https://issues.apache.org/jira/browse/SPARK-43973) Structured Streaming UI now displays failed queries correctly.\n+ [[SPARK-45730]](https://issues.apache.org/jira/browse/SPARK-45730) Improved time constraints for `ReloadingX509TrustManagerSuite`.\n+ [[SPARK-45544]](https://issues.apache.org/jira/browse/SPARK-45544) Integrated SSL support into `TransportContext`.\n+ [[SPARK-45859]](https://issues.apache.org/jira/browse/SPARK-45859) Made UDF objects in `ml.functions` lazy.\n+ [[SPARK-43718]](https://issues.apache.org/jira/browse/SPARK-43718) Fixed nullability for keys in `USING` joins.\n+ [[SPARK-44846]](https://issues.apache.org/jira/browse/SPARK-44846) Removed complex grouping expressions after `RemoveRedundantAggregates`.\n+ Operating system security updates.\n* November 14, 2023 \n+ Partition filters on Delta Lake streaming queries are pushed down before rate limiting to achieve better utilization.\n+ [[SPARK-42205]](https://issues.apache.org/jira/browse/SPARK-42205) Removed logging accumulables in Stage and Task start events.\n+ [[SPARK-45545]](https://issues.apache.org/jira/browse/SPARK-45545) `SparkTransportConf` inherits `SSLOptions` upon creation.\n+ Revert [[SPARK-33861]](https://issues.apache.org/jira/browse/SPARK-33861).\n+ [[SPARK-45541]](https://issues.apache.org/jira/browse/SPARK-45541) Added `SSLFactory`.\n+ [[SPARK-45429]](https://issues.apache.org/jira/browse/SPARK-45429) Added helper classes for SSL RPC communication.\n+ [[SPARK-45584]](https://issues.apache.org/jira/browse/SPARK-45584) Fixed subquery run failure with `TakeOrderedAndProjectExec`.\n+ [[SPARK-45430]](https://issues.apache.org/jira/browse/SPARK-45430) `FramelessOffsetWindowFunction` no longer fails when `IGNORE NULLS` and `offset > rowCount`.\n+ [[SPARK-45427]](https://issues.apache.org/jira/browse/SPARK-45427) Added RPC SSL settings to `SSLOptions` and `SparkTransportConf`.\n+ Operating system security updates.\n* October 24, 2023 \n+ [[SPARK-45426]](https://issues.apache.org/jira/browse/SPARK-45426) Added support for `ReloadingX509TrustManager`.\n+ Miscellaneous fixes.\n* October 13, 2023 \n+ Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.\n+ [[SPARK-45178]](https://issues.apache.org/jira/browse/SPARK-45178) Fallback to running a single batch for `Trigger.AvailableNow` with unsupported sources rather than using the wrapper.\n+ [[SPARK-45084]](https://issues.apache.org/jira/browse/SPARK-45084) `StateOperatorProgress` to use an accurate, adequate shuffle partition number.\n+ [[SPARK-45346]](https://issues.apache.org/jira/browse/SPARK-45346) Parquet schema inference now respects case-sensitive flag when merging a schema.\n+ Operating system security updates.\n* September 10, 2023 \n+ Miscellaneous fixes.\n* August 30, 2023 \n+ [[SPARK-44818]](https://issues.apache.org/jira/browse/SPARK-44818) Fixed race for pending task interrupt issued before `taskThread` is initialized.\n+ [[SPARK-44871]](https://issues.apache.org/jira/browse/SPARK-44871)[11.3-13.0] Fixed `percentile_disc` behavior.\n+ Operating system security updates.\n* August 15, 2023 \n+ [[SPARK-44485]](https://issues.apache.org/jira/browse/SPARK-44485) Optimized `TreeNode.generateTreeString`.\n+ [[SPARK-44504]](https://issues.apache.org/jira/browse/SPARK-44504) Maintenance task cleans up loaded providers on stop error.\n+ [[SPARK-44464]](https://issues.apache.org/jira/browse/SPARK-44464) Fixed `applyInPandasWithStatePythonRunner` to output rows that have `Null` as the first column value.\n+ Operating system security updates.\n* July 27, 2023 \n+ Fixed an issue where `dbutils.fs.ls()` returned `INVALID_PARAMETER_VALUE.LOCATION_OVERLAP` when called for a storage location path which clashed with other external or managed storage location.\n+ [[SPARK-44199]](https://issues.apache.org/jira/browse/SPARK-44199) `CacheManager` no longer refreshes the `fileIndex` unnecessarily.\n+ Operating system security updates.\n* July 24, 2023 \n+ [[SPARK-44136]](https://issues.apache.org/jira/browse/SPARK-44136) Fixed an issue that StateManager can get materialized in executor instead of driver in FlatMapGroupsWithStateExec.\n+ Operating system security updates.\n* June 23, 2023 \n+ Operating system security updates.\n* June 15, 2023 \n+ Photonized `approx_count_distinct`.\n+ Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.\n+ [[SPARK-43779]](https://issues.apache.org/jira/browse/SPARK-43779) `ParseToDate` now loads `EvalMode` in the main thread.\n+ [[SPARK-40862]](https://issues.apache.org/jira/browse/SPARK-40862) Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery\n+ [[SPARK-43156]](https://issues.apache.org/jira/browse/SPARK-43156)[[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098) Extended scalar subquery count bug test with `decorrelateInnerQuery` turned off.\n+ [[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098) Fix correctness COUNT bug when scalar subquery has a group by clause\n+ Operating system security updates.\n* June 2, 2023 \n+ The JSON parser in `failOnUnknownFields` mode drops a record in `DROPMALFORMED` mode and fails directly in `FAILFAST` mode.\n+ Improve the performance of incremental updates with `SHALLOW CLONE` Iceberg and Parquet.\n+ Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.\n+ [[SPARK-43404]](https://issues.apache.org/jira/browse/SPARK-43404)Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.\n+ [[SPARK-43527]](https://issues.apache.org/jira/browse/SPARK-43527) Fixed `catalog.listCatalogs` in PySpark.\n+ [[SPARK-43413]](https://issues.apache.org/jira/browse/SPARK-43413)[11.3-13.0] Fixed `IN` subquery `ListQuery` nullability.\n+ [[SPARK-43340]](https://issues.apache.org/jira/browse/SPARK-43340) Fixed missing stack trace field in eventlogs. \n### [Databricks Runtime 10.4 LTS](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id5) \nSee [Databricks Runtime 10.4 LTS](https://docs.databricks.com/release-notes/runtime/10.4lts.html). \n* November 29, 2023 \n+ [[SPARK-45544]](https://issues.apache.org/jira/browse/SPARK-45544) Integrated SSL support into `TransportContext`.\n+ [[SPARK-45859]](https://issues.apache.org/jira/browse/SPARK-45859) Made UDF objects in `ml.functions` lazy.\n+ [[SPARK-43718]](https://issues.apache.org/jira/browse/SPARK-43718) Fixed nullability for keys in `USING` joins.\n+ [[SPARK-45730]](https://issues.apache.org/jira/browse/SPARK-45730) Improved time constraints for `ReloadingX509TrustManagerSuite`.\n+ [[SPARK-42205]](https://issues.apache.org/jira/browse/SPARK-42205) Removed logging accumulables in Stage and Task start events.\n+ [[SPARK-44846]](https://issues.apache.org/jira/browse/SPARK-44846) Removed complex grouping expressions after `RemoveRedundantAggregates`.\n+ Operating system security updates.\n* November 14, 2023 \n+ [[SPARK-45541]](https://issues.apache.org/jira/browse/SPARK-45541) Added `SSLFactory`.\n+ [[SPARK-45545]](https://issues.apache.org/jira/browse/SPARK-45545) `SparkTransportConf` inherits `SSLOptions` upon creation.\n+ [[SPARK-45427]](https://issues.apache.org/jira/browse/SPARK-45427) Added RPC SSL settings to `SSLOptions` and `SparkTransportConf`.\n+ [[SPARK-45429]](https://issues.apache.org/jira/browse/SPARK-45429) Added helper classes for SSL RPC communication.\n+ [[SPARK-45584]](https://issues.apache.org/jira/browse/SPARK-45584) Fixed subquery run failure with `TakeOrderedAndProjectExec`.\n+ Revert [[SPARK-33861]](https://issues.apache.org/jira/browse/SPARK-33861).\n+ Operating system security updates.\n* October 24, 2023 \n+ [[SPARK-45426]](https://issues.apache.org/jira/browse/SPARK-45426) Added support for `ReloadingX509TrustManager`.\n+ Operating system security updates.\n* October 13, 2023 \n+ [[SPARK-45084]](https://issues.apache.org/jira/browse/SPARK-45084) `StateOperatorProgress` to use an accurate, adequate shuffle partition number.\n+ [[SPARK-45178]](https://issues.apache.org/jira/browse/SPARK-45178) Fallback to running a single batch for `Trigger.AvailableNow` with unsupported sources rather than using the wrapper.\n+ Operating system security updates.\n* September 10, 2023 \n+ Miscellaneous fixes.\n* August 30, 2023 \n+ [[SPARK-44818]](https://issues.apache.org/jira/browse/SPARK-44818) Fixed race for pending task interrupt issued before `taskThread` is initialized.\n+ Operating system security updates.\n* August 15, 2023 \n+ [[SPARK-44504]](https://issues.apache.org/jira/browse/SPARK-44504) Maintenance task cleans up loaded providers on stop error.\n+ [[SPARK-43973]](https://issues.apache.org/jira/browse/SPARK-43973) Structured Streaming UI now appears failed queries correctly.\n+ Operating system security updates.\n* June 23, 2023 \n+ Operating system security updates.\n* June 15, 2023 \n+ Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.\n+ [[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098) Fix correctness COUNT bug when scalar subquery has a group by clause\n+ [[SPARK-40862]](https://issues.apache.org/jira/browse/SPARK-40862) Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery\n+ [[SPARK-43156]](https://issues.apache.org/jira/browse/SPARK-43156)[[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098) Extended scalar subquery count test with `decorrelateInnerQuery` turned off.\n+ Operating system security updates.\n* June 2, 2023 \n+ The JSON parser in `failOnUnknownFields` mode drops a record in `DROPMALFORMED` mode and fails directly in `FAILFAST` mode.\n+ Fixed an issue in JSON rescued data parsing to prevent `UnknownFieldException`.\n+ Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.\n+ [[SPARK-43404]](https://issues.apache.org/jira/browse/SPARK-43404) Skip reusing the sst file for the same version of RocksDB state store to avoid the ID mismatch error.\n+ [[SPARK-43413]](https://issues.apache.org/jira/browse/SPARK-43413) Fixed `IN` subquery `ListQuery` nullability.\n+ Operating system security updates.\n* May 17, 2023 \n+ Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.\n+ [[SPARK-41520]](https://issues.apache.org/jira/browse/SPARK-41520) Split `AND_OR` tree pattern to separate `AND` and `OR`.\n+ [[SPARK-43190]](https://issues.apache.org/jira/browse/SPARK-43190) `ListQuery.childOutput` is now consistent with secondary output.\n+ Operating system security updates.\n* April 25, 2023 \n+ [[SPARK-42928]](https://issues.apache.org/jira/browse/SPARK-42928) Make `resolvePersistentFunction` synchronized.\n+ Operating system security updates.\n* April 11, 2023 \n+ Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.\n+ [[SPARK-42937]](https://issues.apache.org/jira/browse/SPARK-42937) `PlanSubqueries` now sets `InSubqueryExec#shouldBroadcast` to true.\n+ [[SPARK-42967]](https://issues.apache.org/jira/browse/SPARK-42967) Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.\n* March 29, 2023 \n+ [[SPARK-42668]](https://issues.apache.org/jira/browse/SPARK-42668) Catch exception while trying to close the compressed stream in HDFSStateStoreProvider stop\n+ [[SPARK-42635]](https://issues.apache.org/jira/browse/SPARK-42635) Fix the \u2026\n+ Operating system security updates.\n* March 14, 2023 \n+ [[SPARK-41162]](https://issues.apache.org/jira/browse/SPARK-41162) Fix anti- and semi-join for self-join with aggregations\n+ [[SPARK-33206]](https://issues.apache.org/jira/browse/SPARK-33206) Fix shuffle index cache weight calculation for small index files\n+ [[SPARK-42484]](https://issues.apache.org/jira/browse/SPARK-42484) Improved the `UnsafeRowUtils` error message\n+ Miscellaneous fixes.\n* February 28, 2023 \n+ Support generated column for yyyy-MM-dd date\\_format. This change supports partition pruning for yyyy-MM-dd as a date\\_format in generated columns.\n+ Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables\u2019 protocol must be supported by the current version of Databricks Runtime.\n+ Support generated column for yyyy-MM-dd date\\_format. This change supports partition pruning for yyyy-MM-dd as a date\\_format in generated columns.\n+ Operating system security updates.\n* February 16, 2023 \n+ [[SPARK-30220]](https://issues.apache.org/jira/browse/SPARK-30220) Enable using Exists/In subqueries outside of the Filter node\n+ Operating system security updates.\n* January 31, 2023 \n+ Table types of JDBC tables are now EXTERNAL by default.\n* January 18, 2023 \n+ Azure Synapse connector returns a more descriptive error message when a column name contains not valid characters such as whitespaces or semicolons. In such cases, the following message will be returned: `Azure Synapse Analytics failed to run the JDBC query produced by the connector. Check column names do not include not valid characters such as ';' or white space`.\n+ [[SPARK-38277]](https://issues.apache.org/jira/browse/SPARK-38277) Clear write batch after RocksDB state store\u2019s commit\n+ [[SPARK-41199]](https://issues.apache.org/jira/browse/SPARK-41199) Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used\n+ [[SPARK-41198]](https://issues.apache.org/jira/browse/SPARK-41198) Fix metrics in streaming query having CTE and DSv1 streaming source\n+ [[SPARK-41339]](https://issues.apache.org/jira/browse/SPARK-41339) Close and recreate RocksDB write batch instead of just clearing\n+ [[SPARK-41732]](https://issues.apache.org/jira/browse/SPARK-41732) Apply tree-pattern based pruning for the rule SessionWindowing\n+ Operating system security updates.\n* November 29, 2022 \n+ Users can configure leading and trailing whitespaces\u2019 behavior when writing data using the Redshift connector. The following options have been added to control white space handling: \n- `csvignoreleadingwhitespace`, when set to `true`, removes leading white space from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n- `csvignoretrailingwhitespace`, when set to `true`, removes trailing white space from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n+ Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (`cloudFiles.inferColumnTypes` was not set or set to `false`) and the JSON contained nested objects.\n+ Operating system security updates.\n* November 15, 2022 \n+ Upgraded Apache commons-text to 1.10.0.\n+ [[SPARK-40646]](https://issues.apache.org/jira/browse/SPARK-40646) JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set `spark.sql.json.enablePartialResults` to `true`. The flag is turned off by default to preserve the original behavior.\n+ [[SPARK-40292]](https://issues.apache.org/jira/browse/SPARK-40292) Fix column names in `arrays_zip` function when arrays are referenced from nested structs\n+ Operating system security updates.\n* November 1, 2022 \n+ Fixed an issue where if a Delta table had a user-defined column named `_change_type`, but **Change data feed** was turned off on that table, data in that column would incorrectly fill with NULL values when running `MERGE`.\n+ Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when `allowOverwrites` is enabled\n+ [[SPARK-40697]](https://issues.apache.org/jira/browse/SPARK-40697) Add read-side char padding to cover external data files\n+ [[SPARK-40596]](https://issues.apache.org/jira/browse/SPARK-40596) Populate ExecutorDecommission with messages in ExecutorDecommissionInfo\n+ Operating system security updates.\n* October 18, 2022 \n+ Operating system security updates.\n* October 5, 2022 \n+ [[SPARK-40468]](https://issues.apache.org/jira/browse/SPARK-40468) Fix column pruning in CSV when `_corrupt_record` is selected.\n+ Operating system security updates.\n* September 22, 2022 \n+ Users can set spark.conf.set(`spark.databricks.io.listKeysWithPrefix.azure.enabled`, `true`) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers.\n+ [[SPARK-40315]](https://issues.apache.org/jira/browse/SPARK-40315) Add hashCode() for Literal of ArrayBasedMapData\n+ [[SPARK-40213]](https://issues.apache.org/jira/browse/SPARK-40213) Support ASCII value conversion for Latin-1 characters\n+ [[SPARK-40380]](https://issues.apache.org/jira/browse/SPARK-40380) Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan\n+ [[SPARK-38404]](https://issues.apache.org/jira/browse/SPARK-38404) Improve CTE resolution when a nested CTE references an outer CTE\n+ [[SPARK-40089]](https://issues.apache.org/jira/browse/SPARK-40089) Fix sorting for some Decimal types\n+ [[SPARK-39887]](https://issues.apache.org/jira/browse/SPARK-39887) RemoveRedundantAliases should keep aliases that make the output of projection nodes unique\n* September 6, 2022 \n+ [[SPARK-40235]](https://issues.apache.org/jira/browse/SPARK-40235) Use interruptible lock instead of synchronized in Executor.updateDependencies()\n+ [[SPARK-40218]](https://issues.apache.org/jira/browse/SPARK-40218) GROUPING SETS should preserve the grouping columns\n+ [[SPARK-39976]](https://issues.apache.org/jira/browse/SPARK-39976) ArrayIntersect should handle null in left expression correctly\n+ [[SPARK-40053]](https://issues.apache.org/jira/browse/SPARK-40053) Add `assume` to dynamic cancel cases which require Python runtime environment\n+ [[SPARK-35542]](https://issues.apache.org/jira/browse/SPARK-35542) Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it\n+ [[SPARK-40079]](https://issues.apache.org/jira/browse/SPARK-40079) Add Imputer inputCols validation for empty input case\n* August 24, 2022 \n+ [[SPARK-39983]](https://issues.apache.org/jira/browse/SPARK-39983) Do not cache unserialized broadcast relations on the driver\n+ [[SPARK-39775]](https://issues.apache.org/jira/browse/SPARK-39775) Disable validate default values when parsing Avro schemas\n+ [[SPARK-39962]](https://issues.apache.org/jira/browse/SPARK-39962) Apply projection when group attributes are empty\n+ [[SPARK-37643]](https://issues.apache.org/jira/browse/SPARK-37643) when charVarcharAsString is true, for char datatype predicate query should skip rpadding rule\n+ Operating system security updates.\n* August 9, 2022 \n+ [[SPARK-39847]](https://issues.apache.org/jira/browse/SPARK-39847) Fix race condition in RocksDBLoader.loadLibrary() if the caller thread is interrupted\n+ [[SPARK-39731]](https://issues.apache.org/jira/browse/SPARK-39731) Fix issue in CSV and JSON data sources when parsing dates in \u201cyyyyMMdd\u201d format with CORRECTED time parser policy\n+ Operating system security updates.\n* July 27, 2022 \n+ [[SPARK-39625]](https://issues.apache.org/jira/browse/SPARK-39625) Add Dataset.as(StructType)\n+ [[SPARK-39689]](https://issues.apache.org/jira/browse/SPARK-39689)Support 2-chars `lineSep` in CSV data source\n+ [[SPARK-39104]](https://issues.apache.org/jira/browse/SPARK-39104) InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe\n+ [[SPARK-39570]](https://issues.apache.org/jira/browse/SPARK-39570) Inline table should allow expressions with alias\n+ [[SPARK-39702]](https://issues.apache.org/jira/browse/SPARK-39702) Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel\n+ [[SPARK-39575]](https://issues.apache.org/jira/browse/SPARK-39575) add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer\n+ [[SPARK-39476]](https://issues.apache.org/jira/browse/SPARK-39476) Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float\n+ [[SPARK-38868]](https://issues.apache.org/jira/browse/SPARK-38868) Don\u2019t propagate exceptions from filter predicate when optimizing outer joins\n+ Operating system security updates.\n* July 20, 2022 \n+ Make Delta MERGE operation results consistent when the source is non-deterministic.\n+ [[SPARK-39355]](https://issues.apache.org/jira/browse/SPARK-39355) Single column uses quoted to construct UnresolvedAttribute\n+ [[SPARK-39548]](https://issues.apache.org/jira/browse/SPARK-39548) CreateView Command with a window clause query press a wrong window definition not found issue\n+ [[SPARK-39419]](https://issues.apache.org/jira/browse/SPARK-39419) Fix ArraySort to throw an exception when the comparator returns null\n+ Turned off Auto Loader\u2019s use of built-in cloud APIs for directory listing on Azure.\n+ Operating system security updates.\n* July 5, 2022 \n+ [[SPARK-39376]](https://issues.apache.org/jira/browse/SPARK-39376) Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN\n+ Operating system security updates.\n* June 15, 2022 \n+ [[SPARK-39283]](https://issues.apache.org/jira/browse/SPARK-39283) Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator\n+ [[SPARK-39285]](https://issues.apache.org/jira/browse/SPARK-39285) Spark should not check field names when reading files\n+ [[SPARK-34096]](https://issues.apache.org/jira/browse/SPARK-34096) Improve performance for nth\\_value ignore nulls over offset window\n+ [[SPARK-36718]](https://issues.apache.org/jira/browse/SPARK-36718) Fix the `isExtractOnly` check in CollapseProject\n* June 2, 2022 \n+ [[SPARK-39093]](https://issues.apache.org/jira/browse/SPARK-39093) Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral\n+ [[SPARK-38990]](https://issues.apache.org/jira/browse/SPARK-38990) Avoid NullPointerException when evaluating date\\_trunc/trunc format as a bound reference\n+ Operating system security updates.\n* May 18, 2022 \n+ Fixes a potential built-in memory leak in Auto Loader.\n+ [[SPARK-38918]](https://issues.apache.org/jira/browse/SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation\n+ [[SPARK-37593]](https://issues.apache.org/jira/browse/SPARK-37593) Reduce default page size by LONG\\_ARRAY\\_OFFSET if G1GC and ON\\_HEAP are used\n+ [[SPARK-39084]](https://issues.apache.org/jira/browse/SPARK-39084) Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion\n+ [[SPARK-32268]](https://issues.apache.org/jira/browse/SPARK-32268) Add ColumnPruning in injectBloomFilter\n+ [[SPARK-38974]](https://issues.apache.org/jira/browse/SPARK-38974) Filter registered functions with a given database name in list functions\n+ [[SPARK-38931]](https://issues.apache.org/jira/browse/SPARK-38931) Create root dfs directory for RocksDBFileManager with an unknown number of keys on 1st checkpoint\n+ Operating system security updates.\n* April 19, 2022 \n+ Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.\n+ Fixed an issue with notebook-scoped libraries not working in batch streaming jobs.\n+ [[SPARK-38616]](https://issues.apache.org/jira/browse/SPARK-38616) Keep track of SQL query text in Catalyst TreeNode\n+ Operating system security updates.\n* April 6, 2022 \n+ The following Spark SQL functions are now available with this release: \n- `timestampadd()` and `dateadd()`: Add a time duration in a specified unit to a time stamp expression.\n- `timestampdiff()` and `datediff()`: Calculate the time difference between two-time stamp expressions in a specified unit.\n+ Parquet-MR has been upgraded to 1.12.2\n+ Improved support for comprehensive schemas in parquet files\n+ [[SPARK-38631]](https://issues.apache.org/jira/browse/SPARK-38631) Uses Java-based implementation for un-tarring at Utils.unpack\n+ [[SPARK-38509]](https://issues.apache.org/jira/browse/SPARK-38509)[[SPARK-38481]](https://issues.apache.org/jira/browse/SPARK-38481) Cherry-pick three `timestmapadd/diff` changes.\n+ [[SPARK-38523]](https://issues.apache.org/jira/browse/SPARK-38523) Fix referring to the corrupt record column from CSV\n+ [[SPARK-38237]](https://issues.apache.org/jira/browse/SPARK-38237) Allow `ClusteredDistribution` to require full clustering keys\n+ [[SPARK-38437]](https://issues.apache.org/jira/browse/SPARK-38437) Lenient serialization of datetime from datasource\n+ [[SPARK-38180]](https://issues.apache.org/jira/browse/SPARK-38180) Allow safe up-cast expressions in correlated equality predicates\n+ [[SPARK-38155]](https://issues.apache.org/jira/browse/SPARK-38155) Disallow distinct aggregate in lateral subqueries with unsupported predicates\n+ Operating system security updates. \n### [Databricks Runtime 9.1 LTS](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id6) \nSee [Databricks Runtime 9.1 LTS](https://docs.databricks.com/release-notes/runtime/9.1lts.html). \n* November 29, 2023 \n+ [[SPARK-45859]](https://issues.apache.org/jira/browse/SPARK-45859) Made UDF objects in `ml.functions` lazy.\n+ [[SPARK-45544]](https://issues.apache.org/jira/browse/SPARK-45544) Integrated SSL support into `TransportContext`.\n+ [[SPARK-45730]](https://issues.apache.org/jira/browse/SPARK-45730) Improved time constraints for `ReloadingX509TrustManagerSuite`.\n+ Operating system security updates.\n* November 14, 2023 \n+ [[SPARK-45545]](https://issues.apache.org/jira/browse/SPARK-45545) `SparkTransportConf` inherits `SSLOptions` upon creation.\n+ [[SPARK-45429]](https://issues.apache.org/jira/browse/SPARK-45429) Added helper classes for SSL RPC communication.\n+ [[SPARK-45427]](https://issues.apache.org/jira/browse/SPARK-45427) Added RPC SSL settings to `SSLOptions` and `SparkTransportConf`.\n+ [[SPARK-45584]](https://issues.apache.org/jira/browse/SPARK-45584) Fixed subquery run failure with `TakeOrderedAndProjectExec`.\n+ [[SPARK-45541]](https://issues.apache.org/jira/browse/SPARK-45541) Added `SSLFactory`.\n+ [[SPARK-42205]](https://issues.apache.org/jira/browse/SPARK-42205) Removed logging accumulables in Stage and Task start events.\n+ Operating system security updates.\n* October 24, 2023 \n+ [[SPARK-45426]](https://issues.apache.org/jira/browse/SPARK-45426) Added support for `ReloadingX509TrustManager`.\n+ Operating system security updates.\n* October 13, 2023 \n+ Operating system security updates.\n* September 10, 2023 \n+ Miscellaneous fixes.\n* August 30, 2023 \n+ Operating system security updates.\n* August 15, 2023 \n+ Operating system security updates.\n* June 23, 2023 \n+ Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.\n+ Operating system security updates.\n* June 15, 2023 \n+ [[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098) Fix correctness COUNT bug when scalar subquery has a group by clause\n+ [[SPARK-43156]](https://issues.apache.org/jira/browse/SPARK-43156)[[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098) Extend scalar subquery count bug test with `decorrelateInnerQuery` turned off.\n+ [[SPARK-40862]](https://issues.apache.org/jira/browse/SPARK-40862) Support non-aggregated subqueries in RewriteCorrelatedScalarSubquery\n+ Operating system security updates.\n* June 2, 2023 \n+ The JSON parser in `failOnUnknownFields` mode drops a record in `DROPMALFORMED` mode and fails directly in `FAILFAST` mode.\n+ Fixed an issue in JSON rescued data parsing to prevent `UnknownFieldException`.\n+ Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.\n+ [[SPARK-37520]](https://issues.apache.org/jira/browse/SPARK-37520) Add the `startswith()` and `endswith()` string functions\n+ [[SPARK-43413]](https://issues.apache.org/jira/browse/SPARK-43413) Fixed `IN` subquery `ListQuery` nullability.\n+ Operating system security updates.\n* May 17, 2023 \n+ Operating system security updates.\n* April 25, 2023 \n+ Operating system security updates.\n* April 11, 2023 \n+ Fixed an issue where Auto Loader schema evolution can go into an infinite fail loop when a new column is detected in the schema of a nested JSON object.\n+ [[SPARK-42967]](https://issues.apache.org/jira/browse/SPARK-42967) Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is canceled.\n* March 29, 2023 \n+ Operating system security updates.\n* March 14, 2023 \n+ [[SPARK-42484]](https://issues.apache.org/jira/browse/SPARK-42484) Improved error message for `UnsafeRowUtils`.\n+ Miscellaneous fixes.\n* February 28, 2023 \n+ Users can now read and write specific Delta tables requiring Reader version 3 and Writer version 7, using Databricks Runtime 9.1 LTS or later. To succeed, table features listed in the tables\u2019 protocol must be supported by the current version of Databricks Runtime.\n+ Operating system security updates.\n* February 16, 2023 \n+ Operating system security updates.\n* January 31, 2023 \n+ Table types of JDBC tables are now EXTERNAL by default.\n* January 18, 2023 \n+ Operating system security updates.\n* November 29, 2022 \n+ Fixed an issue with JSON parsing in Auto Loader when all columns were left as strings (`cloudFiles.inferColumnTypes` was not set or set to `false`) and the JSON contained nested objects.\n+ Operating system security updates.\n* November 15, 2022 \n+ Upgraded Apache commons-text to 1.10.0.\n+ Operating system security updates.\n+ Miscellaneous fixes.\n* November 1, 2022 \n+ Fixed an issue where if a Delta table had a user-defined column named `_change_type`, but **Change data feed** was turned off on that table, data in that column would incorrectly fill with NULL values when running `MERGE`.\n+ Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when `allowOverwrites` is enabled\n+ [[SPARK-40596]](https://issues.apache.org/jira/browse/SPARK-40596) Populate ExecutorDecommission with messages in ExecutorDecommissionInfo\n+ Operating system security updates.\n* October 18, 2022 \n+ Operating system security updates.\n* October 5, 2022 \n+ Miscellaneous fixes.\n+ Operating system security updates.\n* September 22, 2022 \n+ Users can set spark.conf.set(\u201cspark.databricks.io.listKeysWithPrefix.azure.enabled\u201d, \u201ctrue\u201d) to re-enable the built-in listing for Auto Loader on ADLS Gen2. Built-in listing was previously turned off due to performance issues but can have led to increased storage costs for customers.\n+ [[SPARK-40315]](https://issues.apache.org/jira/browse/SPARK-40315) Add hashCode() for Literal of ArrayBasedMapData\n+ [[SPARK-40089]](https://issues.apache.org/jira/browse/SPARK-40089) Fix sorting for some Decimal types\n+ [[SPARK-39887]](https://issues.apache.org/jira/browse/SPARK-39887) RemoveRedundantAliases should keep aliases that make the output of projection nodes unique\n* September 6, 2022 \n+ [[SPARK-40235]](https://issues.apache.org/jira/browse/SPARK-40235) Use interruptible lock instead of synchronized in Executor.updateDependencies()\n+ [[SPARK-35542]](https://issues.apache.org/jira/browse/SPARK-35542) Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it\n+ [[SPARK-40079]](https://issues.apache.org/jira/browse/SPARK-40079) Add Imputer inputCols validation for empty input case\n* August 24, 2022 \n+ [[SPARK-39666]](https://issues.apache.org/jira/browse/SPARK-39666) Use UnsafeProjection.create to respect `spark.sql.codegen.factoryMode` in ExpressionEncoder\n+ [[SPARK-39962]](https://issues.apache.org/jira/browse/SPARK-39962) Apply projection when group attributes are empty\n+ Operating system security updates.\n* August 9, 2022 \n+ Operating system security updates.\n* July 27, 2022 \n+ Make Delta MERGE operation results consistent when the source is non-deterministic.\n+ [[SPARK-39689]](https://issues.apache.org/jira/browse/SPARK-39689) Support for 2-chars `lineSep` in CSV data source\n+ [[SPARK-39575]](https://issues.apache.org/jira/browse/SPARK-39575) Added `ByteBuffer#rewind` after `ByteBuffer#get` in `AvroDeserializer`.\n+ [[SPARK-37392]](https://issues.apache.org/jira/browse/SPARK-37392) Fixed the performance error for catalyst optimizer.\n+ Operating system security updates.\n* July 13, 2022 \n+ [[SPARK-39419]](https://issues.apache.org/jira/browse/SPARK-39419) `ArraySort` throws an exception when the comparator returns null.\n+ Turned off Auto Loader\u2019s use of built-in cloud APIs for directory listing on Azure.\n+ Operating system security updates.\n* July 5, 2022 \n+ Operating system security updates.\n+ Miscellaneous fixes.\n* June 15, 2022 \n+ [[SPARK-39283]](https://issues.apache.org/jira/browse/SPARK-39283) Fix deadlock between `TaskMemoryManager` and `UnsafeExternalSorter.SpillableIterator`.\n* June 2, 2022 \n+ [[SPARK-34554]](https://issues.apache.org/jira/browse/SPARK-34554) Implement the `copy()` method in `ColumnarMap`.\n+ Operating system security updates.\n* May 18, 2022 \n+ Fixed a potential built-in memory leak in Auto Loader.\n+ Upgrade AWS SDK version from 1.11.655 to 1.11.678.\n+ [[SPARK-38918]](https://issues.apache.org/jira/browse/SPARK-38918) Nested column pruning should filter out attributes that do not belong to the current relation\n+ [[SPARK-39084]](https://issues.apache.org/jira/browse/SPARK-39084) Fix `df.rdd.isEmpty()` by using `TaskContext` to stop iterator on task completion\n+ Operating system security updates.\n* April 19, 2022 \n+ Operating system security updates.\n+ Miscellaneous fixes.\n* April 6, 2022 \n+ [[SPARK-38631]](https://issues.apache.org/jira/browse/SPARK-38631) Uses Java-based implementation for un-tarring at Utils.unpack\n+ Operating system security updates.\n* March 22, 2022 \n+ Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user\u2019s home directory. Previously, the active directory was `/databricks/driver`.\n+ [[SPARK-38437]](https://issues.apache.org/jira/browse/SPARK-38437) Lenient serialization of datetime from datasource\n+ [[SPARK-38180]](https://issues.apache.org/jira/browse/SPARK-38180) Allow safe up-cast expressions in correlated equality predicates\n+ [[SPARK-38155]](https://issues.apache.org/jira/browse/SPARK-38155) Disallow distinct aggregate in lateral subqueries with unsupported predicates\n+ [[SPARK-27442]](https://issues.apache.org/jira/browse/SPARK-27442) Removed a check field when reading or writing data in a parquet.\n* March 14, 2022 \n+ [[SPARK-38236]](https://issues.apache.org/jira/browse/SPARK-38236) Absolute file paths specified in the create/alter table are treated as relative\n+ [[SPARK-34069]](https://issues.apache.org/jira/browse/SPARK-34069) Interrupt task thread if local property `SPARK_JOB_INTERRUPT_ON_CANCEL` is set to true.\n* February 23, 2022 \n+ [[SPARK-37859]](https://issues.apache.org/jira/browse/SPARK-37859) SQL tables created with JDBC with Spark 3.1 are not readable with Spark 3.2.\n* February 8, 2022 \n+ [[SPARK-27442]](https://issues.apache.org/jira/browse/SPARK-27442) Removed a check field when reading or writing data in a parquet.\n+ Operating system security updates.\n* February 1, 2022 \n+ Operating system security updates.\n* January 26, 2022 \n+ Fixed an issue where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.\n+ Fixed an issue where the `OPTIMIZE` command could fail when the ANSI SQL dialect was enabled.\n* January 19, 2022 \n+ Minor fixes and security enhancements.\n+ Operating system security updates.\n* November 4, 2021 \n+ Fixed an issue that could cause Structured Streaming streams to fail with an `ArrayIndexOutOfBoundsException`.\n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: No FileSystem for scheme` or that might cause modifications to `sparkContext.hadoopConfiguration` to not take effect in queries.\n+ The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.\n* October 20, 2021 \n+ Upgraded BigQuery connector from 0.18.1 to 0.22.2. This adds support for the BigNumeric type. \n### [Databricks Runtime 13.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id7) \nSee [Databricks Runtime 13.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/13.0.html). \n* October 13, 2023 \n+ Snowflake-jdbc dependency upgraded from 3.13.29 to 3.13.33.\n+ [[SPARK-42553]](https://issues.apache.org/jira/browse/SPARK-42553)[SQL] Ensure at least one time unit after interval.\n+ [[SPARK-45178]](https://issues.apache.org/jira/browse/SPARK-45178) Fallback to running a single batch for `Trigger.AvailableNow` with unsupported sources rather than using wrapper.\n+ [[SPARK-44658]](https://issues.apache.org/jira/browse/SPARK-44658)[CORE] `ShuffleStatus.getMapStatus` returns `None` instead of `Some(null)`.\n+ [[SPARK-42205]](https://issues.apache.org/jira/browse/SPARK-42205)[CORE] Remove logging of Accumulables in Task/Stage start events in `JsonProtocol`.\n+ Operating system security updates.\n* September 12, 2023 \n+ [[SPARK-44485]](https://issues.apache.org/jira/browse/SPARK-44485)[SQL] Optimize `TreeNode.generateTreeString`.\n+ [[SPARK-44718]](https://issues.apache.org/jira/browse/SPARK-44718)[SQL] Match `ColumnVector` memory-mode config default to `OffHeapMemoryMode` config value.\n+ Miscellaneous bug fixes.\n* August 30, 2023 \n+ [[SPARK-44818]](https://issues.apache.org/jira/browse/SPARK-44818)[Backport] Fixed race for pending task interrupt issued before `taskThread` is initialized.\n+ [[SPARK-44714]](https://issues.apache.org/jira/browse/SPARK-44714) Ease restriction of LCA resolution regarding queries.\n+ [[SPARK-44245]](https://issues.apache.org/jira/browse/SPARK-44245)[PYTHON] `pyspark.sql.dataframe sample()` doctests is now illustrative-only.\n+ [11.3-13.0][[SPARK-44871]])[SQL] Fixed `percentile_disc` behavior.\n+ Operating system security updates.\n* August 15, 2023 \n+ [[SPARK-44643]](https://issues.apache.org/jira/browse/SPARK-44643)[SQL][PYTHON] Fix `Row.__repr__` when the row is empty.\n+ [[SPARK-44504]](https://issues.apache.org/jira/browse/SPARK-44504)[Backport] Maintenance task cleans up loaded providers on stop error.\n+ [[SPARK-44479]](https://issues.apache.org/jira/browse/SPARK-44479)[CONNECT][PYTHON] Fixed `protobuf` conversion from an empty struct type.\n+ [[SPARK-44464]](https://issues.apache.org/jira/browse/SPARK-44464)[SS] Fixed `applyInPandasWithStatePythonRunner` to output rows that have `Null` as first column value.\n+ Miscellaneous bug fixes.\n* July 29, 2023 \n+ Fixed a bug where `dbutils.fs.ls()` returned `INVALID_PARAMETER_VALUE.LOCATION_OVERLAP` when called for a storage location path which clashed with other external or managed storage location.\n+ [[SPARK-44199]](https://issues.apache.org/jira/browse/SPARK-44199) `CacheManager` no longer refreshes the `fileIndex` unnecessarily.\n+ Operating system security updates.\n* July 24, 2023 \n+ [[SPARK-44337]](https://issues.apache.org/jira/browse/SPARK-44337)[PROTOBUF] Fixed an issue where any field set to `Any.getDefaultInstance` caused parse errors.\n+ [[SPARK-44136]](https://issues.apache.org/jira/browse/SPARK-44136) [SS] Fixed an issue where `StateManager` would get materialized in an executor instead of driver in `FlatMapGroupsWithStateExec`.\n+ Revert [[SPARK-42323]](https://issues.apache.org/jira/browse/SPARK-42323)[SQL] Assign name to `_LEGACY_ERROR_TEMP_2332`.\n+ Operating system security updates.\n* June 23, 2023 \n+ Operating system security updates.\n* June 15, 2023 \n+ Photonized `approx_count_distinct`.\n+ Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.\n+ [[SPARK-43156]](https://issues.apache.org/jira/browse/SPARK-43156)[[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098)[SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled\n+ [[SPARK-43779]](https://issues.apache.org/jira/browse/SPARK-43779)[SQL] `ParseToDate` now loads `EvalMode` in the main thread.\n+ [[SPARK-42937]](https://issues.apache.org/jira/browse/SPARK-42937)[SQL] `PlanSubqueries` should set `InSubqueryExec#shouldBroadcast` to true\n+ Operating system security updates.\n* June 2, 2023 \n+ The JSON parser in `failOnUnknownFields` mode drops a record in `DROPMALFORMED` mode and fails directly in `FAILFAST` mode.\n+ Improve the performance of incremental update with `SHALLOW CLONE` Iceberg and Parquet.\n+ Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.\n+ [[SPARK-43404]](https://issues.apache.org/jira/browse/SPARK-43404)[Backport] Skip reusing sst file for same version of RocksDB state store to avoid ID mismatch error.\n+ [[SPARK-43340]](https://issues.apache.org/jira/browse/SPARK-43340)[CORE] Fixed missing stack trace field in eventlogs.\n+ [[SPARK-43300]](https://issues.apache.org/jira/browse/SPARK-43300)[CORE] `NonFateSharingCache` wrapper for Guava Cache.\n+ [[SPARK-43378]](https://issues.apache.org/jira/browse/SPARK-43378)[CORE] Properly close stream objects in `deserializeFromChunkedBuffer`.\n+ [[SPARK-16484]](https://issues.apache.org/jira/browse/SPARK-16484)[SQL] Use 8-bit registers for representing DataSketches.\n+ [[SPARK-43522]](https://issues.apache.org/jira/browse/SPARK-43522)[SQL] Fixed creating struct column name with index of array.\n+ [[SPARK-43413]](https://issues.apache.org/jira/browse/SPARK-43413)[11.3-13.0][SQL] Fixed `IN` subquery `ListQuery` nullability.\n+ [[SPARK-43043]](https://issues.apache.org/jira/browse/SPARK-43043)[CORE] Improved `MapOutputTracker.updateMapOutput` performance.\n+ [[SPARK-16484]](https://issues.apache.org/jira/browse/SPARK-16484)[SQL] Added support for DataSketches HllSketch.\n+ [[SPARK-43123]](https://issues.apache.org/jira/browse/SPARK-43123)[SQL] Internal field metadata no longer leaks to catalogs.\n+ [[SPARK-42851]](https://issues.apache.org/jira/browse/SPARK-42851)[SQL] Guard `EquivalentExpressions.addExpr()` with `supportedExpression()`.\n+ [[SPARK-43336]](https://issues.apache.org/jira/browse/SPARK-43336)[SQL] Casting between `Timestamp` and `TimestampNTZ` requires timezone.\n+ [[SPARK-43286]](https://issues.apache.org/jira/browse/SPARK-43286)[SQL] Updated `aes_encrypt` CBC mode to generate random IVs.\n+ [[SPARK-42852]](https://issues.apache.org/jira/browse/SPARK-42852)[SQL] Reverted `NamedLambdaVariable` related changes from `EquivalentExpressions`.\n+ [[SPARK-43541]](https://issues.apache.org/jira/browse/SPARK-43541)[SQL] Propagate all `Project` tags in resolving of expressions and missing columns..\n+ [[SPARK-43527]](https://issues.apache.org/jira/browse/SPARK-43527)[PYTHON] Fixed `catalog.listCatalogs` in PySpark.\n+ Operating system security updates.\n* May 31, 2023 \n+ Default optimized write support for Delta tables registered in Unity Catalog has expanded to include `CTAS` statements and `INSERT` operations for partitioned tables. This behavior aligns to defaults on SQL warehouses. See [Optimized writes for Delta Lake on Databricks](https://docs.databricks.com/delta/tune-file-size.html#optimized-writes).\n* May 17, 2023 \n+ Fixed a regression where `_metadata.file_path` and `_metadata.file_name` would return incorrectly formatted strings. For example, now a path with spaces are be represented as `s3://test-bucket/some%20directory/some%20data.csv` instead of `s3://test-bucket/some directory/some data.csv`.\n+ Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.\n+ - If an Avro file was read with just the `failOnUnknownFields\\` option or with Auto Loader in the `failOnNewColumns\\` schema evolution mode, columns that have different data types would be read as `null\\` instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the `rescuedDataColumn\\` option.\n+ Auto Loader now does the following.\n+ - Correctly reads and no longer rescues `Integer`, `Short`, `Byte` types if one of these data types are provided, but the Avro file suggests one of the other two types.\n+ - Prevents reading interval types as date or timestamp types to avoid getting corrupt dates.\n+ - Prevents reading `Decimal` types with lower precision.\n+ [[SPARK-43172]](https://issues.apache.org/jira/browse/SPARK-43172) [CONNECT] Exposes host and token from Spark connect client.\n+ [[SPARK-43293]](https://issues.apache.org/jira/browse/SPARK-43293)[SQL] `__qualified_access_only` is ignored in normal columns.\n+ [[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098)[SQL] Fixed correctness `COUNT` bug when scalar subquery is grouped by clause.\n+ [[SPARK-43085]](https://issues.apache.org/jira/browse/SPARK-43085)[SQL] Support for column `DEFAULT` assignment for multi-part table names.\n+ [[SPARK-43190]](https://issues.apache.org/jira/browse/SPARK-43190)[SQL] `ListQuery.childOutput` is now consistent with secondary output.\n+ [[SPARK-43192]](https://issues.apache.org/jira/browse/SPARK-43192) [CONNECT] Removed user agent charset validation.\n* April 25, 2023 \n+ You can modify a Delta table to add support for a Delta table feature using `DeltaTable.addFeatureSupport(feature_name)`.\n+ The `SYNC` command now supports legacy data source formats.\n+ Fixed a bug where using the Python formatter before running any other commands in a Python notebook could cause the notebook path to be missing from `sys.path.`\n+ Databricks now supports specifying default values for columns of Delta tables. `INSERT`, `UPDATE`, `DELETE`, and `MERGE` commands can refer to a column\u2019s default value using the explicit `DEFAULT` keyword. For `INSERT` commands with an explicit list of fewer columns than the target table, corresponding column default values are substituted for the remaining columns (or `NULL` if no default is specified). \n* Fixes a bug where the web terminal could not be used to access files in `/Workspace` for some users. \n+ If a Parquet file was read with just the `failOnUnknownFields` option or with Auto Loader in the `failOnNewColumns` schema evolution mode, columns that had different data types would be read as `null` instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the `rescuedDataColumn` option.\n+ Auto Loader now correctly reads and no longer rescues `Integer`, `Short`, `Byte` types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable.\n+ Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.\n+ [[SPARK-42794]](https://issues.apache.org/jira/browse/SPARK-42794)[SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming.\n+ [[SPARK-39221]](https://issues.apache.org/jira/browse/SPARK-39221)[SQL] Make sensitive information be redacted correctly for thrift server job/stage tab.\n+ [[SPARK-42971]](https://issues.apache.org/jira/browse/SPARK-42971)[CORE] Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event.\n+ [[SPARK-42936]](https://issues.apache.org/jira/browse/SPARK-42936)[SQL] Fix LCA bug when the having clause can be resolved directly by its child aggregate.\n+ [[SPARK-43018]](https://issues.apache.org/jira/browse/SPARK-43018)[SQL] Fix bug for `INSERT` commands with timestamp literals.\n+ Revert [[SPARK-42754]](https://issues.apache.org/jira/browse/SPARK-42754)[SQL][UI] Fix backward compatibility issue in nested SQL run.\n+ Revert [[SPARK-41498]](https://issues.apache.org/jira/browse/SPARK-41498) Propagate metadata through Union.\n+ [[SPARK-43038]](https://issues.apache.org/jira/browse/SPARK-43038)[SQL] Support the CBC mode by `aes_encrypt()`/`aes_decrypt()`.\n+ [[SPARK-42928]](https://issues.apache.org/jira/browse/SPARK-42928)[SQL] Make `resolvePersistentFunction` synchronized.\n+ [[SPARK-42521]](https://issues.apache.org/jira/browse/SPARK-42521)[SQL] Add `NULL` values for `INSERT` with user-specified lists of fewer columns than the target table.\n+ [[SPARK-41391]](https://issues.apache.org/jira/browse/SPARK-41391)[SQL] The output column name of `groupBy.agg(count_distinct)` was incorrect.\n+ [[SPARK-42548]](https://issues.apache.org/jira/browse/SPARK-42548)[SQL] Add `ReferenceAllColumns` to skip rewriting attributes.\n+ [[SPARK-42423]](https://issues.apache.org/jira/browse/SPARK-42423)[SQL] Add metadata column file block start and length.\n+ [[SPARK-42796]](https://issues.apache.org/jira/browse/SPARK-42796)[SQL] Support accessing `TimestampNTZ` columns in `CachedBatch`.\n+ [[SPARK-42266]](https://issues.apache.org/jira/browse/SPARK-42266)[PYTHON] Remove the parent directory in shell.py run when IPython is used.\n+ [[SPARK-43011]](https://issues.apache.org/jira/browse/SPARK-43011)[SQL] `array_insert` should fail with 0 index.\n+ [[SPARK-41874]](https://issues.apache.org/jira/browse/SPARK-41874)[CONNECT][PYTHON] Support `SameSemantics` in Spark Connect.\n+ [[SPARK-42702]](https://issues.apache.org/jira/browse/SPARK-42702)[[SPARK-42623]](https://issues.apache.org/jira/browse/SPARK-42623)[SQL] Support parameterized query in subquery and CTE.\n+ [[SPARK-42967]](https://issues.apache.org/jira/browse/SPARK-42967)[CORE] Fix `SparkListenerTaskStart.stageAttemptId` when a task is started after the stage is cancelled.\n+ Operating system security updates. \n### [Databricks Runtime 12.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id8) \nSee [Databricks Runtime 12.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/12.1.html). \n* June 23, 2023 \n+ Operating system security updates.\n* June 15, 2023 \n+ Photonized `approx_count_distinct`.\n+ Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.\n+ [[SPARK-43779]](https://issues.apache.org/jira/browse/SPARK-43779)[SQL] `ParseToDate` now loads `EvalMode` in the main thread.\n+ [[SPARK-43156]](https://issues.apache.org/jira/browse/SPARK-43156)[[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098)[SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled\n+ Operating system security updates.\n* June 2, 2023 \n+ The JSON parser in `failOnUnknownFields` mode drops a record in `DROPMALFORMED` mode and fails directly in `FAILFAST` mode.\n+ Improve the performance of incremental update with `SHALLOW CLONE` Iceberg and Parquet.\n+ Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.\n+ [[SPARK-43404]](https://issues.apache.org/jira/browse/SPARK-43404)[Backport] Skip reusing sst file for same version of RocksDB state store to avoid ID mismatch error.\n+ [[SPARK-43413]](https://issues.apache.org/jira/browse/SPARK-43413)[11.3-13.0][SQL] Fixed `IN` subquery `ListQuery` nullability.\n+ [[SPARK-43522]](https://issues.apache.org/jira/browse/SPARK-43522)[SQL] Fixed creating struct column name with index of array.\n+ [[SPARK-42444]](https://issues.apache.org/jira/browse/SPARK-42444)[PYTHON] `DataFrame.drop` now handles duplicated columns properly.\n+ [[SPARK-43541]](https://issues.apache.org/jira/browse/SPARK-43541)[SQL] Propagate all `Project` tags in resolving of expressions and missing columns..\n+ [[SPARK-43340]](https://issues.apache.org/jira/browse/SPARK-43340)[CORE] Fixed missing stack trace field in eventlogs.\n+ [[SPARK-42937]](https://issues.apache.org/jira/browse/SPARK-42937)[SQL] `PlanSubqueries` now sets `InSubqueryExec#shouldBroadcast` to true.\n+ [[SPARK-43527]](https://issues.apache.org/jira/browse/SPARK-43527)[PYTHON] Fixed `catalog.listCatalogs` in PySpark.\n+ [[SPARK-43378]](https://issues.apache.org/jira/browse/SPARK-43378)[CORE] Properly close stream objects in `deserializeFromChunkedBuffer`.\n* May 17, 2023 \n+ Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.\n+ If an Avro file was read with just the `failOnUnknownFields\\` option or with Auto Loader in the `failOnNewColumns\\` schema evolution mode, columns that have different data types would be read as `null\\` instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the `rescuedDataColumn\\` option.\n+ Auto Loader now does the following.\n+ - Correctly reads and no longer rescues `Integer`, `Short`, `Byte` types if one of these data types are provided, but the Avro file suggests one of the other two types.\n+ - Prevents reading interval types as date or timestamp types to avoid getting corrupt dates.\n+ - Prevents reading `Decimal` types with lower precision.\n+ [[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098)[SQL] Fixed correctness `COUNT` bug when scalar subquery is grouped by clause.\n+ [[SPARK-43190]](https://issues.apache.org/jira/browse/SPARK-43190)[SQL] `ListQuery.childOutput` is now consistent with secondary output.\n+ Operating system security updates.\n* April 25, 2023 \n+ If a Parquet file was read with just the `failOnUnknownFields` option or with Auto Loader in the `failOnNewColumns` schema evolution mode, columns that had different data types would be read as `null` instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the `rescuedDataColumn` option.\n+ Auto Loader now correctly reads and no longer rescues `Integer`, `Short`, `Byte` types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable.\n+ [[SPARK-43009]](https://issues.apache.org/jira/browse/SPARK-43009)[SQL] Parameterized `sql()` with `Any` constants.\n+ [[SPARK-42971]](https://issues.apache.org/jira/browse/SPARK-42971)[CORE] Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event.\n+ Operating system security updates.\n* April 11, 2023 \n+ Support legacy data source formats in SYNC command.\n+ Fixes a bug in the %autoreload behavior in notebooks that are outside of a repo.\n+ Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.\n+ [[SPARK-42928]](https://issues.apache.org/jira/browse/SPARK-42928)[SQL] Makes `resolvePersistentFunction` synchronized.\n+ [[SPARK-42967]](https://issues.apache.org/jira/browse/SPARK-42967)[CORE] Fixes `SparkListenerTaskStart.stageAttemptId` when a task starts after the stage is cancelled.\n+ Operating system security updates.\n* March 29, 2023 \n+ Auto Loader now triggers at least one synchronous RocksDB log clean up for `Trigger.AvailableNow` streams to ensure that the checkpoint can get regularly cleaned up for fast-running Auto Loader streams. This can cause some streams to take longer before they shut down, but will save you storage costs and improve the Auto Loader experience in future runs.\n+ You can now modify a Delta table to add support to table features using `DeltaTable.addFeatureSupport(feature_name)`.\n+ [[SPARK-42702]](https://issues.apache.org/jira/browse/SPARK-42702)[[SPARK-42623]](https://issues.apache.org/jira/browse/SPARK-42623)[SQL] Support parameterized query in subquery and CTE\n+ [[SPARK-41162]](https://issues.apache.org/jira/browse/SPARK-41162)[SQL] Fix anti- and semi-join for self-join with aggregations\n+ [[SPARK-42403]](https://issues.apache.org/jira/browse/SPARK-42403)[CORE] JsonProtocol should handle null JSON strings\n+ [[SPARK-42668]](https://issues.apache.org/jira/browse/SPARK-42668)[SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort\n+ [[SPARK-42794]](https://issues.apache.org/jira/browse/SPARK-42794)[SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming\n* March 14, 2023 \n+ There is a terminology change for adding features to a Delta table using the table property. The preferred syntax is now `'delta.feature.featureName'='supported'` instead of `'delta.feature.featureName'='enabled'`. For backwards compatibility, using `'delta.feature.featureName'='enabled'` still works and will continue to work.\n+ [[SPARK-42622]](https://issues.apache.org/jira/browse/SPARK-42622)[CORE] Disable substitution in values\n+ [[SPARK-42534]](https://issues.apache.org/jira/browse/SPARK-42534)[SQL] Fix DB2Dialect Limit clause\n+ [[SPARK-42635]](https://issues.apache.org/jira/browse/SPARK-42635)[SQL] Fix the TimestampAdd expression.\n+ [[SPARK-42516]](https://issues.apache.org/jira/browse/SPARK-42516)[SQL] Always capture the session time zone config while creating views\n+ [[SPARK-42484]](https://issues.apache.org/jira/browse/SPARK-42484) [SQL] UnsafeRowUtils better error message\n+ [[SPARK-41793]](https://issues.apache.org/jira/browse/SPARK-41793)[SQL] Incorrect result for window frames defined by a range clause on large decimals\n+ Operating system security updates.\n* February 24, 2023 \n+ You can now use a unified set of options (`host`, `port`, `database`, `user`, `password`) for connecting to the data sources supported in Query Federation (PostgreSQL, MySQL, Synapse, Snowflake, Redshift, SQL Server). Note that `port` is optional and uses the default port number for each data source if not provided.**Example of PostgreSQL connection configuration** \n```\nCREATE TABLE postgresql_table\nUSING postgresql\nOPTIONS (\ndbtable '',\nhost '',\ndatabase '',\nuser '',\npassword secret('scope', 'key')\n);\n\n``` \n**Example of Snowflake connection configuration** \n```\nCREATE TABLE snowflake_table\nUSING snowflake\nOPTIONS (\ndbtable '',\nhost '',\nport '',\ndatabase '',\nuser secret('snowflake_creds', 'my_username'),\npassword secret('snowflake_creds', 'my_password'),\nschema '',\nsfWarehouse ''\n);\n\n``` \n+ [[SPARK-41989]](https://issues.apache.org/jira/browse/SPARK-41989)[PYTHON] Avoid breaking logging config from pyspark.pandas\n+ [[SPARK-42346]](https://issues.apache.org/jira/browse/SPARK-42346)[SQL] Rewrite distinct aggregates after subquery merge\n+ [[SPARK-41990]](https://issues.apache.org/jira/browse/SPARK-41990)[SQL] Use `FieldReference.column` instead of `apply` in V1 to V2 filter conversion\n+ Revert [[SPARK-41848]](https://issues.apache.org/jira/browse/SPARK-41848)[CORE] Fixing task over-scheduled with TaskResourceProfile\n+ [[SPARK-42162]](https://issues.apache.org/jira/browse/SPARK-42162) Introduce MultiCommutativeOp expression as a memory optimization for canonicalizing large trees of commutative expressions\n+ Operating system security updates.\n* February 16, 2023 \n+ SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE\\_ALREADY\\_EXISTS status code.\n+ [[SPARK-41219]](https://issues.apache.org/jira/browse/SPARK-41219)[SQL] IntegralDivide use decimal(1, 0) to represent 0\n+ [[SPARK-36173]](https://issues.apache.org/jira/browse/SPARK-36173)[CORE] Support getting CPU number in TaskContext\n+ [[SPARK-41848]](https://issues.apache.org/jira/browse/SPARK-41848)[CORE] Fixing task over-scheduled with TaskResourceProfile\n+ [[SPARK-42286]](https://issues.apache.org/jira/browse/SPARK-42286)[SQL] Fallback to previous codegen code path for complex expr with CAST\n* January 31, 2023 \n+ Creating a schema with a defined location now requires the user to have SELECT and MODIFY privileges on ANY FILE.\n+ [[SPARK-41581]](https://issues.apache.org/jira/browse/SPARK-41581)[SQL] Assign name to *LEGACY*ERROR\\_TEMP\\_1230\n+ [[SPARK-41996]](https://issues.apache.org/jira/browse/SPARK-41996)[SQL][SS] Fix kafka test to verify lost partitions to account for slow Kafka operations\n+ [[SPARK-41580]](https://issues.apache.org/jira/browse/SPARK-41580)[SQL] Assign name to *LEGACY*ERROR\\_TEMP\\_2137\n+ [[SPARK-41666]](https://issues.apache.org/jira/browse/SPARK-41666)[PYTHON] Support parameterized SQL by `sql()`\n+ [[SPARK-41579]](https://issues.apache.org/jira/browse/SPARK-41579)[SQL] Assign name to *LEGACY*ERROR\\_TEMP\\_1249\n+ [[SPARK-41573]](https://issues.apache.org/jira/browse/SPARK-41573)[SQL] Assign name to *LEGACY*ERROR\\_TEMP\\_2136\n+ [[SPARK-41574]](https://issues.apache.org/jira/browse/SPARK-41574)[SQL] Assign name to *LEGACY*ERROR\\_TEMP\\_2009\n+ [[SPARK-41049]](https://issues.apache.org/jira/browse/SPARK-41049)[Followup] Fix a code sync regression for ConvertToLocalRelation\n+ [[SPARK-41576]](https://issues.apache.org/jira/browse/SPARK-41576)[SQL] Assign name to *LEGACY*ERROR\\_TEMP\\_2051\n+ [[SPARK-41572]](https://issues.apache.org/jira/browse/SPARK-41572)[SQL] Assign name to *LEGACY*ERROR\\_TEMP\\_2149\n+ [[SPARK-41575]](https://issues.apache.org/jira/browse/SPARK-41575)[SQL] Assign name to *LEGACY*ERROR\\_TEMP\\_2054\n+ Operating system security updates. \n### [Databricks Runtime 12.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id9) \nSee [Databricks Runtime 12.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/12.0.html). \n* June 15, 2023 \n+ Photonized `approx_count_distinct`.\n+ Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.\n+ [[SPARK-43156]](https://issues.apache.org/jira/browse/SPARK-43156)[[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098)[SQL] Extend scalar subquery count bug test with decorrelateInnerQuery disabled\n+ [[SPARK-43779]](https://issues.apache.org/jira/browse/SPARK-43779)[SQL] `ParseToDate` now loads `EvalMode` in the main thread.\n+ Operating system security updates.\n* June 2, 2023 \n+ The JSON parser in `failOnUnknownFields` mode drops a record in `DROPMALFORMED` mode and fails directly in `FAILFAST` mode.\n+ Improve the performance of incremental update with `SHALLOW CLONE` Iceberg and Parquet.\n+ Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.\n+ [[SPARK-42444]](https://issues.apache.org/jira/browse/SPARK-42444)[PYTHON] `DataFrame.drop` now handles duplicated columns properly.\n+ [[SPARK-43404]](https://issues.apache.org/jira/browse/SPARK-43404)[Backport] Skip reusing sst file for same version of RocksDB state store to avoid ID mismatch error.\n+ [11.3-13.0][[SPARK-43413]])[SQL] Fixed `IN` subquery `ListQuery` nullability.\n+ [[SPARK-43527]](https://issues.apache.org/jira/browse/SPARK-43527)[PYTHON] Fixed `catalog.listCatalogs` in PySpark.\n+ [[SPARK-43522]](https://issues.apache.org/jira/browse/SPARK-43522)[SQL] Fixed creating struct column name with index of array.\n+ [[SPARK-43541]](https://issues.apache.org/jira/browse/SPARK-43541)[SQL] Propagate all `Project` tags in resolving of expressions and missing columns..\n+ [[SPARK-43340]](https://issues.apache.org/jira/browse/SPARK-43340)[CORE] Fixed missing stack trace field in eventlogs.\n+ [[SPARK-42937]](https://issues.apache.org/jira/browse/SPARK-42937)[SQL] `PlanSubqueries` set `InSubqueryExec#shouldBroadcast` to true.\n* May 17, 2023 \n+ Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.\n+ If an Avro file was read with just the `failOnUnknownFields\\` option or with Auto Loader in the `failOnNewColumns\\` schema evolution mode, columns that have different data types would be read as `null\\` instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the `rescuedDataColumn\\` option.\n+ Auto Loader now does the following.\n+ - Correctly reads and no longer rescues `Integer`, `Short`, `Byte` types if one of these data types are provided, but the Avro file suggests one of the other two types.\n+ - Prevents reading interval types as date or timestamp types to avoid getting corrupt dates.\n+ - Prevents reading `Decimal` types with lower precision.\n+ [[SPARK-43172]](https://issues.apache.org/jira/browse/SPARK-43172) [CONNECT] Exposes host and token from Spark connect client.\n+ [[SPARK-41520]](https://issues.apache.org/jira/browse/SPARK-41520)[SQL] Split `AND_OR` tree pattern to separate `AND` and `OR`.\n+ [[SPARK-43098]](https://issues.apache.org/jira/browse/SPARK-43098)[SQL] Fixed correctness `COUNT` bug when scalar subquery is grouped by clause.\n+ [[SPARK-43190]](https://issues.apache.org/jira/browse/SPARK-43190)[SQL] `ListQuery.childOutput` is now consistent with secondary output.\n+ Operating system security updates.\n* April 25, 2023 \n+ If a Parquet file was read with just the `failOnUnknownFields` option or with Auto Loader in the `failOnNewColumns` schema evolution mode, columns that had different data types would be read as `null` instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the `rescuedDataColumn` option.\n+ Auto Loader now correctly reads and no longer rescues `Integer`, `Short`, `Byte` types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable.\n+ [[SPARK-42971]](https://issues.apache.org/jira/browse/SPARK-42971)[CORE] Change to print `workdir` if `appDirs` is null when worker handle `WorkDirCleanup` event\n+ Operating system security updates.\n* April 11, 2023 \n+ Support legacy data source formats in `SYNC` command.\n+ Fixes a bug in the %autoreload behavior in notebooks which are outside of a repo.\n+ Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.\n+ [[SPARK-42928]](https://issues.apache.org/jira/browse/SPARK-42928)[SQL] Makes `resolvePersistentFunction` synchronized.\n+ [[SPARK-42967]](https://issues.apache.org/jira/browse/SPARK-42967)[CORE] Fixes `SparkListenerTaskStart.stageAttemptId` when a task starts after the stage is cancelled.\n+ Operating system security updates.\n* March 29, 2023 \n+ [[SPARK-42794]](https://issues.apache.org/jira/browse/SPARK-42794)[SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming\n+ [[SPARK-41162]](https://issues.apache.org/jira/browse/SPARK-41162)[SQL] Fix anti- and semi-join for self-join with aggregations\n+ [[SPARK-42403]](https://issues.apache.org/jira/browse/SPARK-42403)[CORE] JsonProtocol should handle null JSON strings\n+ [[SPARK-42668]](https://issues.apache.org/jira/browse/SPARK-42668)[SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort\n+ Miscellaneous bug fixes.\n* March 14, 2023 \n+ [[SPARK-42534]](https://issues.apache.org/jira/browse/SPARK-42534)[SQL] Fix DB2Dialect Limit clause\n+ [[SPARK-42622]](https://issues.apache.org/jira/browse/SPARK-42622)[CORE] Disable substitution in values\n+ [[SPARK-41793]](https://issues.apache.org/jira/browse/SPARK-41793)[SQL] Incorrect result for window frames defined by a range clause on large decimals\n+ [[SPARK-42484]](https://issues.apache.org/jira/browse/SPARK-42484) [SQL] UnsafeRowUtils better error message\n+ [[SPARK-42635]](https://issues.apache.org/jira/browse/SPARK-42635)[SQL] Fix the TimestampAdd expression.\n+ [[SPARK-42516]](https://issues.apache.org/jira/browse/SPARK-42516)[SQL] Always capture the session time zone config while creating views\n+ Operating system security updates.\n* February 24, 2023 \n+ Standardized Connection Options for Query Federation \nYou can now use a unified set of options (`host`, `port`, `database`, `user`, `password`) for connecting to the data sources supported in Query Federation (PostgreSQL, MySQL, Synapse, Snowflake, Redshift, SQL Server). Note that `port` is optional and will use the default port number for each data source if not provided. \n**Example of PostgreSQL connection configuration** \n```\nCREATE TABLE postgresql_table\nUSING postgresql\nOPTIONS (\ndbtable '',\nhost '',\ndatabase '',\nuser '',\npassword secret('scope', 'key')\n);\n\n``` \n**Example of Snowflake connection configuration** \n```\nCREATE TABLE snowflake_table\nUSING snowflake\nOPTIONS (\ndbtable '',\nhost '',\nport '',\ndatabase '',\nuser secret('snowflake_creds', 'my_username'),\npassword secret('snowflake_creds', 'my_password'),\nschema '',\nsfWarehouse ''\n);\n\n```\n+ Revert [[SPARK-41848]](https://issues.apache.org/jira/browse/SPARK-41848)[CORE] Fixing task over-scheduled with TaskResourceProfile\n+ [[SPARK-42162]](https://issues.apache.org/jira/browse/SPARK-42162) Introduce MultiCommutativeOp expression as a memory optimization for canonicalizing large trees of commutative expressions\n+ [[SPARK-41990]](https://issues.apache.org/jira/browse/SPARK-41990)[SQL] Use `FieldReference.column` instead of `apply` in V1 to V2 filter conversion\n+ [[SPARK-42346]](https://issues.apache.org/jira/browse/SPARK-42346)[SQL] Rewrite distinct aggregates after subquery merge\n+ Operating system security updates.\n* February 16, 2023 \n+ Users can now read and write certain Delta tables that require Reader version 3 and Writer version 7, by using Databricks Runtime 9.1 or later. To succeed, table features listed in the tables\u2019 protocol must be supported by the current version of Databricks Runtime.\n+ SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE\\_ALREADY\\_EXISTS status code.\n+ [[SPARK-36173]](https://issues.apache.org/jira/browse/SPARK-36173)[CORE] Support getting CPU number in TaskContext\n+ [[SPARK-42286]](https://issues.apache.org/jira/browse/SPARK-42286)[SQL] Fallback to previous codegen code path for complex expr with CAST\n+ [[SPARK-41848]](https://issues.apache.org/jira/browse/SPARK-41848)[CORE] Fixing task over-scheduled with TaskResourceProfile\n+ [[SPARK-41219]](https://issues.apache.org/jira/browse/SPARK-41219)[SQL] IntegralDivide use decimal(1, 0) to represent 0\n* January 25, 2023 \n+ [[SPARK-41660]](https://issues.apache.org/jira/browse/SPARK-41660)[SQL] Only propagate metadata columns if they are used\n+ [[SPARK-41379]](https://issues.apache.org/jira/browse/SPARK-41379)[SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark\n+ [[SPARK-41669]](https://issues.apache.org/jira/browse/SPARK-41669)[SQL] Early pruning in canCollapseExpressions\n+ Operating system security updates.\n* January 18, 2023 \n+ `REFRESH FUNCTION` SQL command now supports SQL functions and SQL Table functions. For example, the command could be used to refresh a persistent SQL function that was updated in another SQL session.\n+ Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with `spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled` set to `false`.\n+ In Legacy Table ACLs clusters, creating functions that reference JVM classes now requires the `MODIFY_CLASSPATH` privilege.\n+ Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled set to false.\n+ Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned: `Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace`.\n+ Spark structured streaming now works with format(\u201cdeltasharing\u201d) on a delta sharing table as a source.\n+ [[SPARK-38277]](https://issues.apache.org/jira/browse/SPARK-38277)[SS] Clear write batch after RocksDB state store\u2019s commit\n+ [[SPARK-41733]](https://issues.apache.org/jira/browse/SPARK-41733)[SQL][SS] Apply tree-pattern based pruning for the rule ResolveWindowTime\n+ [[SPARK-39591]](https://issues.apache.org/jira/browse/SPARK-39591)[SS] Async Progress Tracking\n+ [[SPARK-41339]](https://issues.apache.org/jira/browse/SPARK-41339)[SQL] Close and recreate RocksDB write batch instead of just clearing\n+ [[SPARK-41198]](https://issues.apache.org/jira/browse/SPARK-41198)[SS] Fix metrics in streaming query having CTE and DSv1 streaming source\n+ [[SPARK-41539]](https://issues.apache.org/jira/browse/SPARK-41539)[SQL] Remap stats and constraints against output in logical plan for LogicalRDD\n+ [[SPARK-41732]](https://issues.apache.org/jira/browse/SPARK-41732)[SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing\n+ [[SPARK-41862]](https://issues.apache.org/jira/browse/SPARK-41862)[SQL] Fix correctness bug related to DEFAULT values in Orc reader\n+ [[SPARK-41199]](https://issues.apache.org/jira/browse/SPARK-41199)[SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used\n+ [[SPARK-41261]](https://issues.apache.org/jira/browse/SPARK-41261)[PYTHON][SS] Fix issue for applyInPandasWithState when the columns of grouping keys are not placed in order from earliest\n+ Operating system security updates.\n* May 17, 2023 \n+ Parquet scans are now robust against OOMs when scanning exceptionally structured files by dynamically adjusting batch size. File metadata is analyzed to preemptively lower batch size and is lowered again on task retries as a final safety net.\n+ Fixed a regression that caused Databricks jobs to persist after failing to connect to the metastore during cluster initialization.\n+ [[SPARK-41520]](https://issues.apache.org/jira/browse/SPARK-41520)[SQL] Split `AND_OR` tree pattern to separate `AND` and `OR`.\n+ [[SPARK-43190]](https://issues.apache.org/jira/browse/SPARK-43190)[SQL] `ListQuery.childOutput` is now consistent with secondary output.\n+ Operating system security updates.\n* April 25, 2023 \n+ If a Parquet file was read with just the `failOnUnknownFields` option or with Auto Loader in the `failOnNewColumns` schema evolution mode, columns that had different data types would be read as `null` instead of throwing an error stating that the file cannot be read. These reads now fail and recommend users to use the `rescuedDataColumn` option.\n+ Auto Loader now correctly reads and no longer rescues `Integer`, `Short`, `Byte` types if one of these data types are provided. The Parquet file suggests one of the other two types. When the rescued data column was previously enabled, the data type mismatch would cause columns to be rescued even though they were readable.\n+ [[SPARK-42937]](https://issues.apache.org/jira/browse/SPARK-42937)[SQL] `PlanSubqueries` now sets `InSubqueryExec#shouldBroadcast` to true.\n+ Operating system security updates.\n* April 11, 2023 \n+ Support legacy data source formats in SYNC command.\n+ Fixes a bug in the %autoreload behavior in notebooks which are outside of a repo.\n+ Fixed a bug where Auto Loader schema evolution can go into an infinite fail loop, when a new column is detected in the schema of a nested JSON object.\n+ [[SPARK-42928]](https://issues.apache.org/jira/browse/SPARK-42928)[SQL] Make resolvePersistentFunction synchronized.\n+ [[SPARK-42967]](https://issues.apache.org/jira/browse/SPARK-42967)[CORE] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled.\n* March 29, 2023 \n+ [[SPARK-42794]](https://issues.apache.org/jira/browse/SPARK-42794)[SS] Increase the lockAcquireTimeoutMs to 2 minutes for acquiring the RocksDB state store in Structure Streaming\n+ [[SPARK-42403]](https://issues.apache.org/jira/browse/SPARK-42403)[CORE] JsonProtocol should handle null JSON strings\n+ [[SPARK-42668]](https://issues.apache.org/jira/browse/SPARK-42668)[SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort\n+ Operating system security updates.\n* March 14, 2023 \n+ [[SPARK-42635]](https://issues.apache.org/jira/browse/SPARK-42635)[SQL] Fix the TimestampAdd expression.\n+ [[SPARK-41793]](https://issues.apache.org/jira/browse/SPARK-41793)[SQL] Incorrect result for window frames defined by a range clause on large decimals\n+ [[SPARK-42484]](https://issues.apache.org/jira/browse/SPARK-42484) [SQL] UnsafeRowUtils better error message\n+ [[SPARK-42534]](https://issues.apache.org/jira/browse/SPARK-42534)[SQL] Fix DB2Dialect Limit clause\n+ [[SPARK-41162]](https://issues.apache.org/jira/browse/SPARK-41162)[SQL] Fix anti- and semi-join for self-join with aggregations\n+ [[SPARK-42516]](https://issues.apache.org/jira/browse/SPARK-42516)[SQL] Always capture the session time zone config while creating views\n+ Miscellaneous bug fixes.\n* February 28, 2023 \n+ Standardized Connection Options for Query Federation \nYou can now use a unified set of options (`host`, `port`, `database`, `user`, `password`) for connecting to the data sources supported in Query Federation (PostgreSQL, MySQL, Synapse, Snowflake, Redshift, SQL Server). Note that `port` is optional and uses the default port number for each data source if not provided. \n**Example of PostgreSQL connection configuration** \n```\nCREATE TABLE postgresql_table\nUSING postgresql\nOPTIONS (\ndbtable '',\nhost '',\ndatabase '',\nuser '',\npassword secret('scope', 'key')\n);\n\n``` \n**Example of Snowflake connection configuration** \n```\nCREATE TABLE snowflake_table\nUSING snowflake\nOPTIONS (\ndbtable '',\nhost '',\nport '',\ndatabase '',\nuser secret('snowflake_creds', 'my_username'),\npassword secret('snowflake_creds', 'my_password'),\nschema '',\nsfWarehouse ''\n);\n\n```\n+ [[SPARK-42286]](https://issues.apache.org/jira/browse/SPARK-42286)[SQL] Fallback to previous codegen code path for complex expr with CAST\n+ [[SPARK-41989]](https://issues.apache.org/jira/browse/SPARK-41989)[PYTHON] Avoid breaking logging config from pyspark.pandas\n+ [[SPARK-42346]](https://issues.apache.org/jira/browse/SPARK-42346)[SQL] Rewrite distinct aggregates after subquery merge\n+ [[SPARK-41360]](https://issues.apache.org/jira/browse/SPARK-41360)[CORE] Avoid BlockManager re-registration if the executor has been lost\n+ [[SPARK-42162]](https://issues.apache.org/jira/browse/SPARK-42162) Introduce MultiCommutativeOp expression as a memory optimization for canonicalizing large trees of commutative expressions\n+ [[SPARK-41990]](https://issues.apache.org/jira/browse/SPARK-41990)[SQL] Use `FieldReference.column` instead of `apply` in V1 to V2 filter conversion\n+ Operating system security updates.\n* February 16, 2023 \n+ Users can now read and write certain Delta tables that require Reader version 3 and Writer version 7, by using Databricks Runtime 9.1 or later. To succeed, table features listed in the tables\u2019 protocol must be supported by the current version of Databricks Runtime.\n+ SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE\\_ALREADY\\_EXISTS status code.\n+ [[SPARK-41219]](https://issues.apache.org/jira/browse/SPARK-41219)[SQL] IntegralDivide use decimal(1, 0) to represent 0\n+ [[SPARK-40382]](https://issues.apache.org/jira/browse/SPARK-40382)[SQL] Group distinct aggregate expressions by semantically equivalent children in `RewriteDistinctAggregates`\n+ Operating system security updates.\n* January 25, 2023 \n+ [[SPARK-41379]](https://issues.apache.org/jira/browse/SPARK-41379)[SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark\n+ [[SPARK-41660]](https://issues.apache.org/jira/browse/SPARK-41660)[SQL] Only propagate metadata columns if they are used\n+ [[SPARK-41669]](https://issues.apache.org/jira/browse/SPARK-41669)[SQL] Early pruning in canCollapseExpressions\n+ Miscellaneous bug fixes.\n* January 18, 2023 \n+ `REFRESH FUNCTION` SQL command now supports SQL functions and SQL Table functions. For example, the command could be used to refresh a persistent SQL function that was updated in another SQL session.\n+ Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with `spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled` set to `false`.\n+ Java Database Connectivity (JDBC) data source v1 now supports LIMIT clause pushdown to improve performance in queries. This feature is enabled by default and can be disabled with spark.databricks.optimizer.jdbcDSv1LimitPushdown.enabled set to false.\n+ Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned: `Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace`.\n+ [[SPARK-41198]](https://issues.apache.org/jira/browse/SPARK-41198)[SS] Fix metrics in streaming query having CTE and DSv1 streaming source\n+ [[SPARK-41862]](https://issues.apache.org/jira/browse/SPARK-41862)[SQL] Fix correctness bug related to DEFAULT values in Orc reader\n+ [[SPARK-41539]](https://issues.apache.org/jira/browse/SPARK-41539)[SQL] Remap stats and constraints against output in logical plan for LogicalRDD\n+ [[SPARK-39591]](https://issues.apache.org/jira/browse/SPARK-39591)[SS] Async Progress Tracking\n+ [[SPARK-41199]](https://issues.apache.org/jira/browse/SPARK-41199)[SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used\n+ [[SPARK-41261]](https://issues.apache.org/jira/browse/SPARK-41261)[PYTHON][SS] Fix issue for applyInPandasWithState when the columns of grouping keys are not placed in order from earliest\n+ [[SPARK-41339]](https://issues.apache.org/jira/browse/SPARK-41339)[SQL] Close and recreate RocksDB write batch instead of just clearing\n+ [[SPARK-41732]](https://issues.apache.org/jira/browse/SPARK-41732)[SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing\n+ [[SPARK-38277]](https://issues.apache.org/jira/browse/SPARK-38277)[SS] Clear write batch after RocksDB state store\u2019s commit\n+ Operating system security updates.\n* November 29, 2022 \n+ Users can configure leading and trailing whitespaces\u2019 behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling: \n- `csvignoreleadingwhitespace`, when set to `true`, removes leading whitespace from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n- `csvignoretrailingwhitespace`, when set to `true`, removes trailing whitespace from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n+ Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (`cloudFiles.inferColumnTypes` was not set or set to `false`) and the JSON contained nested objects.\n+ Upgrade `snowflake-jdbc` dependency to version 3.13.22.\n+ Table types of JDBC tables are now EXTERNAL by default.\n+ [[SPARK-40906]](https://issues.apache.org/jira/browse/SPARK-40906)[SQL] `Mode` should copy keys before inserting into Map\n+ Operating system security updates.\n* November 15, 2022 \n+ Table ACLs and UC Shared clusters now allow the Dataset.toJSON method from python.\n+ [[SPARK-40646]](https://issues.apache.org/jira/browse/SPARK-40646) JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behaviorset `spark.sql.json.enablePartialResults` to `true`. The flag is disabled by default to preserve the original behavior\n+ [[SPARK-40903]](https://issues.apache.org/jira/browse/SPARK-40903)[SQL] Avoid reordering decimal Add for canonicalization if data type is changed\n+ [[SPARK-40618]](https://issues.apache.org/jira/browse/SPARK-40618)[SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking\n+ [[SPARK-40697]](https://issues.apache.org/jira/browse/SPARK-40697)[SQL] Add read-side char padding to cover external data files\n+ Operating system security updates.\n* November 1, 2022 \n+ Structured Streaming in Unity Catalog now supports refreshing temporary access tokens. Streaming workloads running with Unity Catalog all purpose or jobs clusters no longer fail after the initial token expiry.\n+ Fixed an issue where if a Delta table had a user-defined column named `_change_type`, but **Change data feed** was disabled on that table, data in that column would incorrectly fill with NULL values when running `MERGE`.\n+ Fixed an issue where running `MERGE` and using exactly 99 columns from the source in the condition could result in `java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow`.\n+ Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when `allowOverwrites` is enabled.\n+ Upgraded Apache commons-text to 1.10.0.\n+ [[SPARK-38881]](https://issues.apache.org/jira/browse/SPARK-38881)[DSTREAMS][KINESIS][PYSPARK] Added Support for CloudWatch MetricsLevel Config\n+ [[SPARK-40596]](https://issues.apache.org/jira/browse/SPARK-40596)[CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo\n+ [[SPARK-40670]](https://issues.apache.org/jira/browse/SPARK-40670)[SS][PYTHON] Fix NPE in applyInPandasWithState when the input schema has \u201cnon-nullable\u201d column(s)\n+ Operating system security updates. \n### [Databricks Runtime 11.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id10) \nSee [Databricks Runtime 11.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/11.2.html). \n* February 28, 2023 \n+ [[SPARK-42286]](https://issues.apache.org/jira/browse/SPARK-42286)[SQL] Fallback to previous codegen code path for complex expr with CAST\n+ [[SPARK-42346]](https://issues.apache.org/jira/browse/SPARK-42346)[SQL] Rewrite distinct aggregates after subquery merge\n+ Operating system security updates.\n* February 16, 2023 \n+ Users can now read and write certain Delta tables that require Reader version 3 and Writer version 7, by using Databricks Runtime 9.1 or later. To succeed, table features listed in the tables\u2019 protocol must be supported by the current version of Databricks Runtime.\n+ SYNC command supports syncing recreated Hive Metastore tables. If a HMS table has been SYNCed previously to Unity Catalog but then dropped and recreated, a subsequent re-sync will work instead of throwing TABLE\\_ALREADY\\_EXISTS status code.\n+ [[SPARK-41219]](https://issues.apache.org/jira/browse/SPARK-41219)[SQL] IntegralDivide use decimal(1, 0) to represent 0\n+ Operating system security updates.\n* January 31, 2023 \n+ Table types of JDBC tables are now EXTERNAL by default.\n+ [[SPARK-41379]](https://issues.apache.org/jira/browse/SPARK-41379)[SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark\n* January 18, 2023 \n+ Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned: `Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace`.\n+ [[SPARK-41198]](https://issues.apache.org/jira/browse/SPARK-41198)[SS] Fix metrics in streaming query having CTE and DSv1 streaming source\n+ [[SPARK-41862]](https://issues.apache.org/jira/browse/SPARK-41862)[SQL] Fix correctness bug related to DEFAULT values in Orc reader\n+ [[SPARK-41539]](https://issues.apache.org/jira/browse/SPARK-41539)[SQL] Remap stats and constraints against output in logical plan for LogicalRDD\n+ [[SPARK-41199]](https://issues.apache.org/jira/browse/SPARK-41199)[SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used\n+ [[SPARK-41339]](https://issues.apache.org/jira/browse/SPARK-41339)[SQL] Close and recreate RocksDB write batch instead of just clearing\n+ [[SPARK-41732]](https://issues.apache.org/jira/browse/SPARK-41732)[SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing\n+ [[SPARK-38277]](https://issues.apache.org/jira/browse/SPARK-38277)[SS] Clear write batch after RocksDB state store\u2019s commit\n+ Operating system security updates.\n* November 29, 2022 \n+ Users can configure leading and trailing whitespaces\u2019 behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling: \n- `csvignoreleadingwhitespace`, when set to `true`, removes leading whitespace from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n- `csvignoretrailingwhitespace`, when set to `true`, removes trailing whitespace from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n+ Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (`cloudFiles.inferColumnTypes` was not set or set to `false`) and the JSON contained nested objects.\n+ [[SPARK-40906]](https://issues.apache.org/jira/browse/SPARK-40906)[SQL] `Mode` should copy keys before inserting into Map\n+ Operating system security updates.\n* November 15, 2022 \n+ [[SPARK-40646]](https://issues.apache.org/jira/browse/SPARK-40646) JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set `spark.sql.json.enablePartialResults` to `true`. The flag is disabled by default to preserve the original behavior\n+ [[SPARK-40618]](https://issues.apache.org/jira/browse/SPARK-40618)[SQL] Fix bug in MergeScalarSubqueries rule with nested subqueries using reference tracking\n+ [[SPARK-40697]](https://issues.apache.org/jira/browse/SPARK-40697)[SQL] Add read-side char padding to cover external data files\n+ Operating system security updates.\n* November 1, 2022 \n+ Upgraded Apache commons-text to 1.10.0.\n+ Fixed an issue where if a Delta table had a user-defined column named `_change_type`, but **Change data feed** was disabled on that table, data in that column would incorrectly fill with NULL values when running `MERGE`.\n+ Fixed an issue where running `MERGE` and using exactly 99 columns from the source in the condition could result in `java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow`.\n+ Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when `allowOverwrites` is enabled\n+ [[SPARK-40596]](https://issues.apache.org/jira/browse/SPARK-40596)[CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo\n+ Operating system security updates.\n* October 19, 2022 \n+ Fixed an issue with COPY INTO usage with temporary credentials on Unity Catalog enabled clusters / warehouses.\n+ [[SPARK-40213]](https://issues.apache.org/jira/browse/SPARK-40213)[SQL] Support ASCII value conversion for Latin-1 characters\n+ Operating system security updates.\n* October 5, 2022 \n+ Users can set spark.conf.set(\u201cspark.databricks.io.listKeysWithPrefix.azure.enabled\u201d, \u201ctrue\u201d) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.\n+ [[SPARK-40315]](https://issues.apache.org/jira/browse/SPARK-40315)[SQL]Support url encode/decode as built-in function and tidy up url-related functions\n+ [[SPARK-40156]](https://issues.apache.org/jira/browse/SPARK-40156)[SQL]`url_decode()` should the return an error class\n+ [[SPARK-40169]](https://issues.apache.org/jira/browse/SPARK-40169) Don\u2019t pushdown Parquet filters with no reference to data schema\n+ [[SPARK-40460]](https://issues.apache.org/jira/browse/SPARK-40460)[SS] Fix streaming metrics when selecting `_metadata`\n+ [[SPARK-40468]](https://issues.apache.org/jira/browse/SPARK-40468)[SQL] Fix column pruning in CSV when *corrupt*record is selected\n+ [[SPARK-40055]](https://issues.apache.org/jira/browse/SPARK-40055)[SQL] listCatalogs should also return spark\\_catalog even when spark\\_catalog implementation is defaultSessionCatalog\n+ Operating system security updates.\n* September 22, 2022 \n+ [[SPARK-40315]](https://issues.apache.org/jira/browse/SPARK-40315)[SQL] Add hashCode() for Literal of ArrayBasedMapData\n+ [[SPARK-40389]](https://issues.apache.org/jira/browse/SPARK-40389)[SQL] Decimals can\u2019t upcast as integral types if the cast can overflow\n+ [[SPARK-40380]](https://issues.apache.org/jira/browse/SPARK-40380)[SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan\n+ [[SPARK-40066]](https://issues.apache.org/jira/browse/SPARK-40066)[SQL][FOLLOW-UP] Check if ElementAt is resolved before getting its dataType\n+ [[SPARK-40109]](https://issues.apache.org/jira/browse/SPARK-40109)[SQL] New SQL function: get()\n+ [[SPARK-40066]](https://issues.apache.org/jira/browse/SPARK-40066)[SQL] ANSI mode: always return null on invalid access to map column\n+ [[SPARK-40089]](https://issues.apache.org/jira/browse/SPARK-40089)[SQL] Fix sorting for some Decimal types\n+ [[SPARK-39887]](https://issues.apache.org/jira/browse/SPARK-39887)[SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique\n+ [[SPARK-40152]](https://issues.apache.org/jira/browse/SPARK-40152)[SQL] Fix split\\_part codegen compilation issue\n+ [[SPARK-40235]](https://issues.apache.org/jira/browse/SPARK-40235)[CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()\n+ [[SPARK-40212]](https://issues.apache.org/jira/browse/SPARK-40212)[SQL] SparkSQL castPartValue does not properly handle byte, short, or float\n+ [[SPARK-40218]](https://issues.apache.org/jira/browse/SPARK-40218)[SQL] GROUPING SETS should preserve the grouping columns\n+ [[SPARK-35542]](https://issues.apache.org/jira/browse/SPARK-35542)[ML] Fix: Bucketizer created for multiple columns with parameters\n+ [[SPARK-40079]](https://issues.apache.org/jira/browse/SPARK-40079) Add Imputer inputCols validation for empty input case\n+ [[SPARK-39912]](https://issues.apache.org/jira/browse/SPARK-39912)[SPARK-39828](https://issues.apache.org/jira/browse/SPARK-39828)[SQL] Refine CatalogImpl \n### [Databricks Runtime 11.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id11) \nSee [Databricks Runtime 11.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html). \n* January 31, 2023 \n+ [[SPARK-41379]](https://issues.apache.org/jira/browse/SPARK-41379)[SS][PYTHON] Provide cloned spark session in DataFrame in user function for foreachBatch sink in PySpark\n+ Miscellaneous bug fixes.\n* January 18, 2023 \n+ Azure Synapse connector now returns a more descriptive error message when a column name contains invalid characters such as whitespaces or semicolons. In such cases, the following message will be returned: `Azure Synapse Analytics failed to execute the JDBC query produced by the connector. Make sure column names do not include any invalid characters such as ';' or whitespace`.\n+ [[SPARK-41198]](https://issues.apache.org/jira/browse/SPARK-41198)[SS] Fix metrics in streaming query having CTE and DSv1 streaming source\n+ [[SPARK-41862]](https://issues.apache.org/jira/browse/SPARK-41862)[SQL] Fix correctness bug related to DEFAULT values in Orc reader\n+ [[SPARK-41199]](https://issues.apache.org/jira/browse/SPARK-41199)[SS] Fix metrics issue when DSv1 streaming source and DSv2 streaming source are co-used\n+ [[SPARK-41339]](https://issues.apache.org/jira/browse/SPARK-41339)[SQL] Close and recreate RocksDB write batch instead of just clearing\n+ [[SPARK-41732]](https://issues.apache.org/jira/browse/SPARK-41732)[SQL][SS] Apply tree-pattern based pruning for the rule SessionWindowing\n+ [[SPARK-38277]](https://issues.apache.org/jira/browse/SPARK-38277)[SS] Clear write batch after RocksDB state store\u2019s commit\n+ Operating system security updates.\n* November 29, 2022 \n+ Users can configure leading and trailing whitespaces\u2019 behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling: \n- `csvignoreleadingwhitespace`, when set to `true`, removes leading whitespace from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n- `csvignoretrailingwhitespace`, when set to `true`, removes trailing whitespace from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n+ Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (`cloudFiles.inferColumnTypes` was not set or set to `false`) and the JSON contained nested objects.\n+ [[SPARK-39650]](https://issues.apache.org/jira/browse/SPARK-39650)[SS] Fix incorrect value schema in streaming deduplication with backward compatibility\n+ Operating system security updates.\n* November 15, 2022 \n+ [[SPARK-40646]](https://issues.apache.org/jira/browse/SPARK-40646) JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of record can still be parsed correctly instead of returning nulls.To opt-in for the improved behavior, set `spark.sql.json.enablePartialResults` to `true`. The flag is disabled by default to preserve the original behavior\n+ Operating system security updates.\n* November 1, 2022 \n+ Upgraded Apache commons-text to 1.10.0.\n+ Fixed an issue where if a Delta table had a user-defined column named `_change_type`, but **Change data feed** was disabled on that table, data in that column would incorrectly fill with NULL values when running `MERGE`.\n+ Fixed an issue where running `MERGE` and using exactly 99 columns from the source in the condition could result in `java.lang.ClassCastException: org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to org.apache.spark.sql.catalyst.InternalRow`.\n+ Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when `allowOverwrites` is enabled\n+ [[SPARK-40697]](https://issues.apache.org/jira/browse/SPARK-40697)[SQL] Add read-side char padding to cover external data files\n+ [[SPARK-40596]](https://issues.apache.org/jira/browse/SPARK-40596)[CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo\n+ Operating system security updates.\n* October 18, 2022 \n+ Fixed an issue with COPY INTO usage with temporary credentials on Unity Catalog enabled clusters / warehouses.\n+ [[SPARK-40213]](https://issues.apache.org/jira/browse/SPARK-40213)[SQL] Support ASCII value conversion for Latin-1 characters\n+ Operating system security updates.\n* October 5, 2022 \n+ Users can set spark.conf.set(\u201cspark.databricks.io.listKeysWithPrefix.azure.enabled\u201d, \u201ctrue\u201d) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.\n+ [[SPARK-40169]](https://issues.apache.org/jira/browse/SPARK-40169) Don\u2019t pushdown Parquet filters with no reference to data schema\n+ [[SPARK-40460]](https://issues.apache.org/jira/browse/SPARK-40460)[SS] Fix streaming metrics when selecting `_metadata`\n+ [[SPARK-40468]](https://issues.apache.org/jira/browse/SPARK-40468)[SQL] Fix column pruning in CSV when *corrupt*record is selected\n+ [[SPARK-40055]](https://issues.apache.org/jira/browse/SPARK-40055)[SQL] listCatalogs should also return spark\\_catalog even when spark\\_catalog implementation is defaultSessionCatalog\n+ Operating system security updates.\n* September 22, 2022 \n+ [[SPARK-40315]](https://issues.apache.org/jira/browse/SPARK-40315)[SQL] Add hashCode() for Literal of ArrayBasedMapData\n+ [[SPARK-40380]](https://issues.apache.org/jira/browse/SPARK-40380)[SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan\n+ [[SPARK-40089]](https://issues.apache.org/jira/browse/SPARK-40089)[SQL] Fix sorting for some Decimal types\n+ [[SPARK-39887]](https://issues.apache.org/jira/browse/SPARK-39887)[SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique\n+ [[SPARK-40152]](https://issues.apache.org/jira/browse/SPARK-40152)[SQL] Fix split\\_part codegen compilation issue\n* September 6, 2022 \n+ We have updated the permission model in Table Access Controls (Table ACLs) so that only MODIFY permissions are needed to change a table\u2019s schema or table properties with ALTER TABLE. Previously, these operations required a user to own the table. Ownership is still required to grant permissions on a table, change its owner, change its location, or rename it. This change makes the permission model for Table ACLs more consistent with Unity Catalog.\n+ [[SPARK-40235]](https://issues.apache.org/jira/browse/SPARK-40235)[CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()\n+ [[SPARK-40212]](https://issues.apache.org/jira/browse/SPARK-40212)[SQL] SparkSQL castPartValue does not properly handle byte, short, or float\n+ [[SPARK-40218]](https://issues.apache.org/jira/browse/SPARK-40218)[SQL] GROUPING SETS should preserve the grouping columns\n+ [[SPARK-39976]](https://issues.apache.org/jira/browse/SPARK-39976)[SQL] ArrayIntersect should handle null in left expression correctly\n+ [[SPARK-40053]](https://issues.apache.org/jira/browse/SPARK-40053)[CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment\n+ [[SPARK-35542]](https://issues.apache.org/jira/browse/SPARK-35542)[CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it\n+ [[SPARK-40079]](https://issues.apache.org/jira/browse/SPARK-40079)[CORE] Add Imputer inputCols validation for empty input case\n* August 24, 2022 \n+ Shares, providers, and recipients now support SQL commands to change owners, comment, rename\n+ [[SPARK-39983]](https://issues.apache.org/jira/browse/SPARK-39983)[CORE][SQL] Do not cache unserialized broadcast relations on the driver\n+ [[SPARK-39912]](https://issues.apache.org/jira/browse/SPARK-39912)[[SPARK-39828]](https://issues.apache.org/jira/browse/SPARK-39828)[SQL] Refine CatalogImpl\n+ [[SPARK-39775]](https://issues.apache.org/jira/browse/SPARK-39775)[CORE][AVRO] Disable validate default values when parsing Avro schemas\n+ [[SPARK-39806]](https://issues.apache.org/jira/browse/SPARK-39806) Fixed the issue on queries accessing METADATA struct crash on partitioned tables\n+ [[SPARK-39867]](https://issues.apache.org/jira/browse/SPARK-39867)[SQL] Global limit should not inherit OrderPreservingUnaryNode\n+ [[SPARK-39962]](https://issues.apache.org/jira/browse/SPARK-39962)[PYTHON][SQL] Apply projection when group attributes are empty\n+ [[SPARK-39839]](https://issues.apache.org/jira/browse/SPARK-39839)[SQL] Handle special case of null variable-length Decimal with non-zero offsetAndSize in UnsafeRow structural integrity check\n+ [[SPARK-39713]](https://issues.apache.org/jira/browse/SPARK-39713)[SQL] ANSI mode: add suggestion of using try\\_element\\_at for INVALID\\_ARRAY\\_INDEX error\n+ [[SPARK-39847]](https://issues.apache.org/jira/browse/SPARK-39847)[SS] Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted\n+ [[SPARK-39731]](https://issues.apache.org/jira/browse/SPARK-39731)[SQL] Fix issue in CSV and JSON data sources when parsing dates in \u201cyyyyMMdd\u201d format with CORRECTED time parser policy\n+ Operating system security updates.\n* August 10, 2022 \n+ For Delta tables with table access control, automatic schema evolution through DML statements such as `INSERT` and `MERGE` is now available for all users who have `MODIFY` permissions on such tables. Additionally, permissions required to perform schema evolution with `COPY INTO` are now lowered from `OWNER` to `MODIFY` for consistency with other commands. These changes make the table ACL security model more consistent with the Unity Catalog security model as well as with other operations such as replacing a table.\n+ [[SPARK-39889]](https://issues.apache.org/jira/browse/SPARK-39889) Enhance the error message of division by 0\n+ [[SPARK-39795]](https://issues.apache.org/jira/browse/SPARK-39795) [SQL] New SQL function: try\\_to\\_timestamp\n+ [[SPARK-39749]](https://issues.apache.org/jira/browse/SPARK-39749) Always use plain string representation on casting decimal as string under ANSI mode\n+ [[SPARK-39625]](https://issues.apache.org/jira/browse/SPARK-39625) Rename df.as to df.to\n+ [[SPARK-39787]](https://issues.apache.org/jira/browse/SPARK-39787) [SQL] Use error class in the parsing error of function to\\_timestamp\n+ [[SPARK-39625]](https://issues.apache.org/jira/browse/SPARK-39625) [SQL] Add Dataset.as(StructType)\n+ [[SPARK-39689]](https://issues.apache.org/jira/browse/SPARK-39689) Support 2-chars `lineSep` in CSV datasource\n+ [[SPARK-39579]](https://issues.apache.org/jira/browse/SPARK-39579) [SQL][PYTHON][R] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace\n+ [[SPARK-39702]](https://issues.apache.org/jira/browse/SPARK-39702) [CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel\n+ [[SPARK-39575]](https://issues.apache.org/jira/browse/SPARK-39575) [AVRO] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer\n+ [[SPARK-39265]](https://issues.apache.org/jira/browse/SPARK-39265) [SQL] Fix test failure when SPARK\\_ANSI\\_SQL\\_MODE is enabled\n+ [[SPARK-39441]](https://issues.apache.org/jira/browse/SPARK-39441) [SQL] Speed up DeduplicateRelations\n+ [[SPARK-39497]](https://issues.apache.org/jira/browse/SPARK-39497) [SQL] Improve the analysis exception of missing map key column\n+ [[SPARK-39476]](https://issues.apache.org/jira/browse/SPARK-39476) [SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float\n+ [[SPARK-39434]](https://issues.apache.org/jira/browse/SPARK-39434) [SQL] Provide runtime error query context when array index is out of bounding \n### [Databricks Runtime 11.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id12) \nSee [Databricks Runtime 11.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/11.0.html). \n* November 29, 2022 \n+ Users can configure leading and trailing whitespaces\u2019 behavior when writing data using the Redshift connector. The following options have been added to control whitespace handling: \n- `csvignoreleadingwhitespace`, when set to `true`, removes leading whitespace from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n- `csvignoretrailingwhitespace`, when set to `true`, removes trailing whitespace from values during writes when `tempformat` is set to `CSV` or `CSV GZIP`. Whitespaces are retained when the config is set to `false`. By default, the value is `true`.\n+ Fixed a bug with JSON parsing in Auto Loader when all columns were left as strings (`cloudFiles.inferColumnTypes` was not set or set to `false`) and the JSON contained nested objects.\n+ [[SPARK-39650]](https://issues.apache.org/jira/browse/SPARK-39650)[SS] Fix incorrect value schema in streaming deduplication with backward compatibility\n+ Operating system security updates.\n* November 15, 2022 \n+ [[SPARK-40646]](https://issues.apache.org/jira/browse/SPARK-40646) JSON parsing for structs, maps, and arrays has been fixed so when a part of a record does not match the schema, the rest of the record can still be parsed correctly instead of returning nulls. To opt-in for the improved behavior, set `spark.sql.json.enablePartialResults` to `true`. The flag is disabled by default to preserve the original behavior.\n* November 1, 2022 \n+ Upgraded Apache commons-text to 1.10.0.\n+ Fixed an issue where if a Delta table had a user-defined column named `_change_type`, but **Change data feed** was disabled on that table, data in that column would incorrectly fill with NULL values when running `MERGE`.\n+ Fixed an issue with Auto Loader where a file can be duplicated in the same micro-batch when `allowOverwrites` is enabled\n+ [[SPARK-40697]](https://issues.apache.org/jira/browse/SPARK-40697)[SQL] Add read-side char padding to cover external data files\n+ [[SPARK-40596]](https://issues.apache.org/jira/browse/SPARK-40596)[CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo\n+ Operating system security updates.\n* October 18, 2022 \n+ [[SPARK-40213]](https://issues.apache.org/jira/browse/SPARK-40213)[SQL] Support ASCII value conversion for Latin-1 characters\n+ Operating system security updates.\n* October 5, 2022 \n+ Users can set spark.conf.set(\u201cspark.databricks.io.listKeysWithPrefix.azure.enabled\u201d, \u201ctrue\u201d) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.\n+ [[SPARK-40169]](https://issues.apache.org/jira/browse/SPARK-40169) Don\u2019t pushdown Parquet filters with no reference to data schema\n+ [[SPARK-40460]](https://issues.apache.org/jira/browse/SPARK-40460)[SS] Fix streaming metrics when selecting `_metadata`\n+ [[SPARK-40468]](https://issues.apache.org/jira/browse/SPARK-40468)[SQL] Fix column pruning in CSV when *corrupt*record is selected\n+ Operating system security updates.\n* September 22, 2022 \n+ [[SPARK-40315]](https://issues.apache.org/jira/browse/SPARK-40315)[SQL] Add hashCode() for Literal of ArrayBasedMapData\n+ [[SPARK-40380]](https://issues.apache.org/jira/browse/SPARK-40380)[SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan\n+ [[SPARK-40089]](https://issues.apache.org/jira/browse/SPARK-40089)[SQL] Fix sorting for some Decimal types\n+ [[SPARK-39887]](https://issues.apache.org/jira/browse/SPARK-39887)[SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique\n+ [[SPARK-40152]](https://issues.apache.org/jira/browse/SPARK-40152)[SQL] Fix split\\_part codegen compilation issue\n* September 6, 2022 \n+ [[SPARK-40235]](https://issues.apache.org/jira/browse/SPARK-40235)[CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()\n+ [[SPARK-40212]](https://issues.apache.org/jira/browse/SPARK-40212)[SQL] SparkSQL castPartValue does not properly handle byte, short, or float\n+ [[SPARK-40218]](https://issues.apache.org/jira/browse/SPARK-40218)[SQL] GROUPING SETS should preserve the grouping columns\n+ [[SPARK-39976]](https://issues.apache.org/jira/browse/SPARK-39976)[SQL] ArrayIntersect should handle null in left expression correctly\n+ [[SPARK-40053]](https://issues.apache.org/jira/browse/SPARK-40053)[CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment\n+ [[SPARK-35542]](https://issues.apache.org/jira/browse/SPARK-35542)[CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it\n+ [[SPARK-40079]](https://issues.apache.org/jira/browse/SPARK-40079)[CORE] Add Imputer inputCols validation for empty input case\n* August 24, 2022 \n+ [[SPARK-39983]](https://issues.apache.org/jira/browse/SPARK-39983)[CORE][SQL] Do not cache unserialized broadcast relations on the driver\n+ [[SPARK-39775]](https://issues.apache.org/jira/browse/SPARK-39775)[CORE][AVRO] Disable validate default values when parsing Avro schemas\n+ [[SPARK-39806]](https://issues.apache.org/jira/browse/SPARK-39806) Fixed the issue on queries accessing METADATA struct crash on partitioned tables\n+ [[SPARK-39867]](https://issues.apache.org/jira/browse/SPARK-39867)[SQL] Global limit should not inherit OrderPreservingUnaryNode\n+ [[SPARK-39962]](https://issues.apache.org/jira/browse/SPARK-39962)[PYTHON][SQL] Apply projection when group attributes are empty\n+ Operating system security updates.\n* August 9, 2022 \n+ [[SPARK-39713]](https://issues.apache.org/jira/browse/SPARK-39713)[SQL] ANSI mode: add suggestion of using try\\_element\\_at for INVALID\\_ARRAY\\_INDEX error\n+ [[SPARK-39847]](https://issues.apache.org/jira/browse/SPARK-39847) Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted\n+ [[SPARK-39731]](https://issues.apache.org/jira/browse/SPARK-39731)[SQL] Fix issue in CSV and JSON data sources when parsing dates in \u201cyyyyMMdd\u201d format with CORRECTED time parser policy\n+ [[SPARK-39889]](https://issues.apache.org/jira/browse/SPARK-39889) Enhance the error message of division by 0\n+ [[SPARK-39795]](https://issues.apache.org/jira/browse/SPARK-39795)[SQL] New SQL function: try\\_to\\_timestamp\n+ [[SPARK-39749]](https://issues.apache.org/jira/browse/SPARK-39749) Always use plain string representation on casting decimal as string under ANSI mode\n+ [[SPARK-39625]](https://issues.apache.org/jira/browse/SPARK-39625)[SQL] Add Dataset.to(StructType)\n+ [[SPARK-39787]](https://issues.apache.org/jira/browse/SPARK-39787)[SQL] Use error class in the parsing error of function to\\_timestamp\n+ Operating system security updates.\n* July 27, 2022 \n+ [[SPARK-39689]](https://issues.apache.org/jira/browse/SPARK-39689)Support 2-chars `lineSep` in CSV datasource\n+ [[SPARK-39104]](https://issues.apache.org/jira/browse/SPARK-39104)[SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe\n+ [[SPARK-39702]](https://issues.apache.org/jira/browse/SPARK-39702)[CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel\n+ [[SPARK-39575]](https://issues.apache.org/jira/browse/SPARK-39575)[AVRO] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer\n+ [[SPARK-39497]](https://issues.apache.org/jira/browse/SPARK-39497)[SQL] Improve the analysis exception of missing map key column\n+ [[SPARK-39441]](https://issues.apache.org/jira/browse/SPARK-39441)[SQL] Speed up DeduplicateRelations\n+ [[SPARK-39476]](https://issues.apache.org/jira/browse/SPARK-39476)[SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float\n+ [[SPARK-39434]](https://issues.apache.org/jira/browse/SPARK-39434)[SQL] Provide runtime error query context when array index is out of bounding\n+ [[SPARK-39570]](https://issues.apache.org/jira/browse/SPARK-39570)[SQL] Inline table should allow expressions with alias\n+ Operating system security updates.\n* July 13, 2022 \n+ Make Delta MERGE operation results consistent when source is non-deterministic.\n+ Fixed an issue for the cloud\\_files\\_state TVF when running on non-DBFS paths.\n+ Disabled Auto Loader\u2019s use of native cloud APIs for directory listing on Azure.\n+ [[SPARK-38796]](https://issues.apache.org/jira/browse/SPARK-38796)[SQL] Update to\\_number and try\\_to\\_number functions to allow PR with positive numbers\n+ [[SPARK-39272]](https://issues.apache.org/jira/browse/SPARK-39272)[SQL] Increase the start position of query context by 1\n+ [[SPARK-39419]](https://issues.apache.org/jira/browse/SPARK-39419)[SQL] Fix ArraySort to throw an exception when the comparator returns null\n+ Operating system security updates.\n* July 5, 2022 \n+ Improvement on error messages for a range of error classes.\n+ [[SPARK-39451]](https://issues.apache.org/jira/browse/SPARK-39451)[SQL] Support casting intervals to integrals in ANSI mode\n+ [[SPARK-39361]](https://issues.apache.org/jira/browse/SPARK-39361) Don\u2019t use Log4J2\u2019s extended throwable conversion pattern in default logging configurations\n+ [[SPARK-39354]](https://issues.apache.org/jira/browse/SPARK-39354)[SQL] Ensure show `Table or view not found` even if there are `dataTypeMismatchError` related to `Filter` at the same time\n+ [[SPARK-38675]](https://issues.apache.org/jira/browse/SPARK-38675)[CORE] Fix race during unlock in BlockInfoManager\n+ [[SPARK-39392]](https://issues.apache.org/jira/browse/SPARK-39392)[SQL] Refine ANSI error messages for try\\_\\* function hints\n+ [[SPARK-39214]](https://issues.apache.org/jira/browse/SPARK-39214)[SQL][3.3] Improve errors related to CAST\n+ [[SPARK-37939]](https://issues.apache.org/jira/browse/SPARK-37939)[SQL] Use error classes in the parsing errors of properties\n+ [[SPARK-39085]](https://issues.apache.org/jira/browse/SPARK-39085)[SQL] Move the error message of `INCONSISTENT_BEHAVIOR_CROSS_VERSION` to error-classes.json\n+ [[SPARK-39376]](https://issues.apache.org/jira/browse/SPARK-39376)[SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN\n+ [[SPARK-39283]](https://issues.apache.org/jira/browse/SPARK-39283)[CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator\n+ [[SPARK-39285]](https://issues.apache.org/jira/browse/SPARK-39285)[SQL] Spark should not check field names when reading files\n+ Operating system security updates. \n### [Databricks Runtime 10.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id13) \nSee [Databricks Runtime 10.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/10.5.html). \n* November 1, 2022 \n+ Fixed an issue where if a Delta table had a user-defined column named `_change_type`, but **Change data feed** was disabled on that table, data in that column would incorrectly fill with NULL values when running `MERGE`.\n+ [[SPARK-40697]](https://issues.apache.org/jira/browse/SPARK-40697)[SQL] Add read-side char padding to cover external data files\n+ [[SPARK-40596]](https://issues.apache.org/jira/browse/SPARK-40596)[CORE] Populate ExecutorDecommission with messages in ExecutorDecommissionInfo\n+ Operating system security updates.\n* October 18, 2022 \n+ Operating system security updates.\n* October 5, 2022 \n+ Users can set spark.conf.set(\u201cspark.databricks.io.listKeysWithPrefix.azure.enabled\u201d, \u201ctrue\u201d) to re-enable native listing for Auto Loader on ADLS Gen2. Native listing was previously turned off due to performance issues, but may have led to an increase in storage costs for customers. This change was rolled out to DBR 10.4 and 9.1 in the previous maintenance update.\n+ reload4j has been upgraded to 1.2.19 to fix vulnerabilities.\n+ [[SPARK-40460]](https://issues.apache.org/jira/browse/SPARK-40460)[SS] Fix streaming metrics when selecting `_metadata`\n+ [[SPARK-40468]](https://issues.apache.org/jira/browse/SPARK-40468)[SQL] Fix column pruning in CSV when *corrupt*record is selected\n+ Operating system security updates.\n* September 22, 2022 \n+ [[SPARK-40315]](https://issues.apache.org/jira/browse/SPARK-40315)[SQL] Add hashCode() for Literal of ArrayBasedMapData\n+ [[SPARK-40213]](https://issues.apache.org/jira/browse/SPARK-40213)[SQL] Support ASCII value conversion for Latin-1 characters\n+ [[SPARK-40380]](https://issues.apache.org/jira/browse/SPARK-40380)[SQL] Fix constant-folding of InvokeLike to avoid non-serializable literal embedded in the plan\n+ [[SPARK-38404]](https://issues.apache.org/jira/browse/SPARK-38404)[SQL] Improve CTE resolution when a nested CTE references an outer CTE\n+ [[SPARK-40089]](https://issues.apache.org/jira/browse/SPARK-40089)[SQL] Fix sorting for some Decimal types\n+ [[SPARK-39887]](https://issues.apache.org/jira/browse/SPARK-39887)[SQL] RemoveRedundantAliases should keep aliases that make the output of projection nodes unique\n+ Operating system security updates.\n* September 6, 2022 \n+ [[SPARK-40235]](https://issues.apache.org/jira/browse/SPARK-40235)[CORE] Use interruptible lock instead of synchronized in Executor.updateDependencies()\n+ [[SPARK-39976]](https://issues.apache.org/jira/browse/SPARK-39976)[SQL] ArrayIntersect should handle null in left expression correctly\n+ [[SPARK-40053]](https://issues.apache.org/jira/browse/SPARK-40053)[CORE][SQL][TESTS] Add `assume` to dynamic cancel cases which requiring Python runtime environment\n+ [[SPARK-35542]](https://issues.apache.org/jira/browse/SPARK-35542)[CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it\n+ [[SPARK-40079]](https://issues.apache.org/jira/browse/SPARK-40079)[CORE] Add Imputer inputCols validation for empty input case\n* August 24, 2022 \n+ [[SPARK-39983]](https://issues.apache.org/jira/browse/SPARK-39983)[CORE][SQL] Do not cache unserialized broadcast relations on the driver\n+ [[SPARK-39775]](https://issues.apache.org/jira/browse/SPARK-39775)[CORE][AVRO] Disable validate default values when parsing Avro schemas\n+ [[SPARK-39806]](https://issues.apache.org/jira/browse/SPARK-39806) Fixed the issue on queries accessing METADATA struct crash on partitioned tables\n+ [[SPARK-39962]](https://issues.apache.org/jira/browse/SPARK-39962)[PYTHON][SQL] Apply projection when group attributes are empty\n+ [[SPARK-37643]](https://issues.apache.org/jira/browse/SPARK-37643)[SQL] when charVarcharAsString is true, for char datatype predicate query should skip rpadding rule\n+ Operating system security updates.\n* August 9, 2022 \n+ [[SPARK-39847]](https://issues.apache.org/jira/browse/SPARK-39847) Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted\n+ [[SPARK-39731]](https://issues.apache.org/jira/browse/SPARK-39731)[SQL] Fix issue in CSV and JSON data sources when parsing dates in \u201cyyyyMMdd\u201d format with CORRECTED time parser policy\n+ Operating system security updates.\n* July 27, 2022 \n+ [[SPARK-39625]](https://issues.apache.org/jira/browse/SPARK-39625)[SQL] Add Dataset.as(StructType)\n+ [[SPARK-39689]](https://issues.apache.org/jira/browse/SPARK-39689)Support 2-chars `lineSep` in CSV datasource\n+ [[SPARK-39104]](https://issues.apache.org/jira/browse/SPARK-39104)[SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe\n+ [[SPARK-39570]](https://issues.apache.org/jira/browse/SPARK-39570)[SQL] Inline table should allow expressions with alias\n+ [[SPARK-39702]](https://issues.apache.org/jira/browse/SPARK-39702)[CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel\n+ [[SPARK-39575]](https://issues.apache.org/jira/browse/SPARK-39575)[AVRO] add ByteBuffer#rewind after ByteBuffer#get in AvroDeserializer\n+ [[SPARK-39476]](https://issues.apache.org/jira/browse/SPARK-39476)[SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float\n+ Operating system security updates.\n* July 13, 2022 \n+ Make Delta MERGE operation results consistent when source is non-deterministic.\n+ [[SPARK-39355]](https://issues.apache.org/jira/browse/SPARK-39355)[SQL] Single column uses quoted to construct UnresolvedAttribute\n+ [[SPARK-39548]](https://issues.apache.org/jira/browse/SPARK-39548)[SQL] CreateView Command with a window clause query hit a wrong window definition not found issue\n+ [[SPARK-39419]](https://issues.apache.org/jira/browse/SPARK-39419)[SQL] Fix ArraySort to throw an exception when the comparator returns null\n+ Disabled Auto Loader\u2019s use of native cloud APIs for directory listing on Azure.\n+ Operating system security updates.\n* July 5, 2022 \n+ [[SPARK-39376]](https://issues.apache.org/jira/browse/SPARK-39376)[SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN\n+ Operating system security updates.\n* June 15, 2022 \n+ [[SPARK-39283]](https://issues.apache.org/jira/browse/SPARK-39283)[CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator\n+ [[SPARK-39285]](https://issues.apache.org/jira/browse/SPARK-39285)[SQL] Spark should not check field names when reading files\n+ [[SPARK-34096]](https://issues.apache.org/jira/browse/SPARK-34096)[SQL] Improve performance for nth\\_value ignore nulls over offset window\n+ [[SPARK-36718]](https://issues.apache.org/jira/browse/SPARK-36718)[SQL][FOLLOWUP] Fix the `isExtractOnly` check in CollapseProject\n* June 2, 2022 \n+ [[SPARK-39166]](https://issues.apache.org/jira/browse/SPARK-39166)[SQL] Provide runtime error query context for binary arithmetic when WSCG is off\n+ [[SPARK-39093]](https://issues.apache.org/jira/browse/SPARK-39093)[SQL] Avoid codegen compilation error when dividing year-month intervals or day-time intervals by an integral\n+ [[SPARK-38990]](https://issues.apache.org/jira/browse/SPARK-38990)[SQL] Avoid NullPointerException when evaluating date\\_trunc/trunc format as a bound reference\n+ Operating system security updates.\n* May 18, 2022 \n+ Fixes a potential native memory leak in Auto Loader.\n+ [[SPARK-38868]](https://issues.apache.org/jira/browse/SPARK-38868)[SQL]Don\u2019t propagate exceptions from filter predicate when optimizing outer joins\n+ [[SPARK-38796]](https://issues.apache.org/jira/browse/SPARK-38796)[SQL] Implement the to\\_number and try\\_to\\_number SQL functions according to a new specification\n+ [[SPARK-38918]](https://issues.apache.org/jira/browse/SPARK-38918)[SQL] Nested column pruning should filter out attributes that do not belong to the current relation\n+ [[SPARK-38929]](https://issues.apache.org/jira/browse/SPARK-38929)[SQL] Improve error messages for cast failures in ANSI\n+ [[SPARK-38926]](https://issues.apache.org/jira/browse/SPARK-38926)[SQL] Output types in error messages in SQL style\n+ [[SPARK-39084]](https://issues.apache.org/jira/browse/SPARK-39084)[PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion\n+ [[SPARK-32268]](https://issues.apache.org/jira/browse/SPARK-32268)[SQL] Add ColumnPruning in injectBloomFilter\n+ [[SPARK-38908]](https://issues.apache.org/jira/browse/SPARK-38908)[SQL] Provide query context in runtime error of Casting from String to Number/Date/Timestamp/Boolean\n+ [[SPARK-39046]](https://issues.apache.org/jira/browse/SPARK-39046)[SQL] Return an empty context string if TreeNode.origin is wrongly set\n+ [[SPARK-38974]](https://issues.apache.org/jira/browse/SPARK-38974)[SQL] Filter registered functions with a given database name in list functions\n+ [[SPARK-38762]](https://issues.apache.org/jira/browse/SPARK-38762)[SQL] Provide query context in Decimal overflow errors\n+ [[SPARK-38931]](https://issues.apache.org/jira/browse/SPARK-38931)[SS] Create root dfs directory for RocksDBFileManager with unknown number of keys on 1st checkpoint\n+ [[SPARK-38992]](https://issues.apache.org/jira/browse/SPARK-38992)[CORE] Avoid using bash -c in ShellBasedGroupsMappingProvider\n+ [[SPARK-38716]](https://issues.apache.org/jira/browse/SPARK-38716)[SQL] Provide query context in map key not exists error\n+ [[SPARK-38889]](https://issues.apache.org/jira/browse/SPARK-38889)[SQL] Compile boolean column filters to use the bit type for MSSQL data source\n+ [[SPARK-38698]](https://issues.apache.org/jira/browse/SPARK-38698)[SQL] Provide query context in runtime error of Divide/Div/Reminder/Pmod\n+ [[SPARK-38823]](https://issues.apache.org/jira/browse/SPARK-38823)[SQL] Make `NewInstance` non-foldable to fix aggregation buffer corruption issue\n+ [[SPARK-38809]](https://issues.apache.org/jira/browse/SPARK-38809)[SS] Implement option to skip null values in symmetric hash implementation of stream-stream joins\n+ [[SPARK-38676]](https://issues.apache.org/jira/browse/SPARK-38676)[SQL] Provide SQL query context in runtime error message of Add/Subtract/Multiply\n+ [[SPARK-38677]](https://issues.apache.org/jira/browse/SPARK-38677)[PYSPARK] Python MonitorThread should detect deadlock due to blocking I/O\n+ Operating system security updates. \n### [Databricks Runtime 10.3 (Unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id14) \nSee [Databricks Runtime 10.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/10.3.html). \n* July 27, 2022 \n+ [[SPARK-39689]](https://issues.apache.org/jira/browse/SPARK-39689)Support 2-chars `lineSep` in CSV datasource\n+ [[SPARK-39104]](https://issues.apache.org/jira/browse/SPARK-39104)[SQL] InMemoryRelation#isCachedColumnBuffersLoaded should be thread-safe\n+ [[SPARK-39702]](https://issues.apache.org/jira/browse/SPARK-39702)[CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel\n+ Operating system security updates.\n* July 20, 2022 \n+ Make Delta MERGE operation results consistent when source is non-deterministic.\n+ [[SPARK-39476]](https://issues.apache.org/jira/browse/SPARK-39476)[SQL] Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float\n+ [[SPARK-39548]](https://issues.apache.org/jira/browse/SPARK-39548)[SQL] CreateView Command with a window clause query hit a wrong window definition not found issue\n+ [[SPARK-39419]](https://issues.apache.org/jira/browse/SPARK-39419)[SQL] Fix ArraySort to throw an exception when the comparator returns null\n+ Operating system security updates.\n* July 5, 2022 \n+ [[SPARK-39376]](https://issues.apache.org/jira/browse/SPARK-39376)[SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN\n+ Operating system security updates.\n* June 15, 2022 \n+ [[SPARK-39283]](https://issues.apache.org/jira/browse/SPARK-39283)[CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator\n+ [[SPARK-39285]](https://issues.apache.org/jira/browse/SPARK-39285)[SQL] Spark should not check field names when reading files\n+ [[SPARK-34096]](https://issues.apache.org/jira/browse/SPARK-34096)[SQL] Improve performance for nth\\_value ignore nulls over offset window\n+ [[SPARK-36718]](https://issues.apache.org/jira/browse/SPARK-36718)[SQL][FOLLOWUP] Fix the `isExtractOnly` check in CollapseProject\n* June 2, 2022 \n+ [[SPARK-38990]](https://issues.apache.org/jira/browse/SPARK-38990)[SQL] Avoid NullPointerException when evaluating date\\_trunc/trunc format as a bound reference\n+ Operating system security updates.\n* May 18, 2022 \n+ Fixes a potential native memory leak in Auto Loader.\n+ [[SPARK-38918]](https://issues.apache.org/jira/browse/SPARK-38918)[SQL] Nested column pruning should filter out attributes that do not belong to the current relation\n+ [[SPARK-37593]](https://issues.apache.org/jira/browse/SPARK-37593)[CORE] Reduce default page size by LONG\\_ARRAY\\_OFFSET if G1GC and ON\\_HEAP are used\n+ [[SPARK-39084]](https://issues.apache.org/jira/browse/SPARK-39084)[PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion\n+ [[SPARK-32268]](https://issues.apache.org/jira/browse/SPARK-32268)[SQL] Add ColumnPruning in injectBloomFilter\n+ [[SPARK-38974]](https://issues.apache.org/jira/browse/SPARK-38974)[SQL] Filter registered functions with a given database name in list functions\n+ [[SPARK-38889]](https://issues.apache.org/jira/browse/SPARK-38889)[SQL] Compile boolean column filters to use the bit type for MSSQL data source\n+ Operating system security updates.\n* May 4, 2022 \n+ Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.\n* April 19, 2022 \n+ [[SPARK-38616]](https://issues.apache.org/jira/browse/SPARK-38616)[SQL] Keep track of SQL query text in Catalyst TreeNode\n+ Operating system security updates.\n* April 6, 2022 \n+ [[SPARK-38631]](https://issues.apache.org/jira/browse/SPARK-38631)[CORE] Uses Java-based implementation for un-tarring at Utils.unpack\n+ Operating system security updates.\n* March 22, 2022 \n+ Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user\u2019s home directory. Previously, the working directory was `/databricks/driver`.\n+ [[SPARK-38437]](https://issues.apache.org/jira/browse/SPARK-38437)[SQL] Lenient serialization of datetime from datasource\n+ [[SPARK-38180]](https://issues.apache.org/jira/browse/SPARK-38180)[SQL] Allow safe up-cast expressions in correlated equality predicates\n+ [[SPARK-38155]](https://issues.apache.org/jira/browse/SPARK-38155)[SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates\n+ [[SPARK-38325]](https://issues.apache.org/jira/browse/SPARK-38325)[SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()\n* March 14, 2022 \n+ Improved transaction conflict detection for empty transactions in Delta Lake.\n+ [[SPARK-38185]](https://issues.apache.org/jira/browse/SPARK-38185)[SQL] Fix data incorrect if aggregate function is empty\n+ [[SPARK-38318]](https://issues.apache.org/jira/browse/SPARK-38318)[SQL] regression when replacing a dataset view\n+ [[SPARK-38236]](https://issues.apache.org/jira/browse/SPARK-38236)[SQL] Absolute file paths specified in create/alter table are treated as relative\n+ [[SPARK-35937]](https://issues.apache.org/jira/browse/SPARK-35937)[SQL] Extracting date field from timestamp should work in ANSI mode\n+ [[SPARK-34069]](https://issues.apache.org/jira/browse/SPARK-34069)[SQL] Kill barrier tasks should respect `SPARK_JOB_INTERRUPT_ON_CANCEL`\n+ [[SPARK-37707]](https://issues.apache.org/jira/browse/SPARK-37707)[SQL] Allow store assignment between TimestampNTZ and Date/Timestamp\n* February 23, 2022 \n+ [[SPARK-27442]](https://issues.apache.org/jira/browse/SPARK-27442)[SQL] Remove check field name when reading/writing data in parquet \n### [Databricks Runtime 10.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id15) \nSee [Databricks Runtime 10.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/10.2.html). \n* June 15, 2022 \n+ [[SPARK-39283]](https://issues.apache.org/jira/browse/SPARK-39283)[CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator\n+ [[SPARK-39285]](https://issues.apache.org/jira/browse/SPARK-39285)[SQL] Spark should not check field names when reading files\n+ [[SPARK-34096]](https://issues.apache.org/jira/browse/SPARK-34096)[SQL] Improve performance for nth\\_value ignore nulls over offset window\n* June 2, 2022 \n+ [[SPARK-38918]](https://issues.apache.org/jira/browse/SPARK-38918)[SQL] Nested column pruning should filter out attributes that do not belong to the current relation\n+ [[SPARK-38990]](https://issues.apache.org/jira/browse/SPARK-38990)[SQL] Avoid NullPointerException when evaluating date\\_trunc/trunc format as a bound reference\n+ Operating system security updates.\n* May 18, 2022 \n+ Fixes a potential native memory leak in Auto Loader.\n+ [[SPARK-39084]](https://issues.apache.org/jira/browse/SPARK-39084)[PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion\n+ [[SPARK-38889]](https://issues.apache.org/jira/browse/SPARK-38889)[SQL] Compile boolean column filters to use the bit type for MSSQL data source\n+ [[SPARK-38931]](https://issues.apache.org/jira/browse/SPARK-38931)[SS] Create root dfs directory for RocksDBFileManager with unknown number of keys on 1st checkpoint\n+ Operating system security updates.\n* May 4, 2022 \n+ Upgraded Java AWS SDK from version 1.11.655 to 1.12.1899.\n* April 19, 2022 \n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* April 6, 2022 \n+ [[SPARK-38631]](https://issues.apache.org/jira/browse/SPARK-38631)[CORE] Uses Java-based implementation for un-tarring at Utils.unpack\n+ Operating system security updates.\n* March 22, 2022 \n+ Changed the current working directory of notebooks on High Concurrency clusters with either table access control or credential passthrough enabled to the user\u2019s home directory. Previously, the working directory was `/databricks/driver`.\n+ [[SPARK-38437]](https://issues.apache.org/jira/browse/SPARK-38437)[SQL] Lenient serialization of datetime from datasource\n+ [[SPARK-38180]](https://issues.apache.org/jira/browse/SPARK-38180)[SQL] Allow safe up-cast expressions in correlated equality predicates\n+ [[SPARK-38155]](https://issues.apache.org/jira/browse/SPARK-38155)[SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates\n+ [[SPARK-38325]](https://issues.apache.org/jira/browse/SPARK-38325)[SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()\n* March 14, 2022 \n+ Improved transaction conflict detection for empty transactions in Delta Lake.\n+ [[SPARK-38185]](https://issues.apache.org/jira/browse/SPARK-38185)[SQL] Fix data incorrect if aggregate function is empty\n+ [[SPARK-38318]](https://issues.apache.org/jira/browse/SPARK-38318)[SQL] regression when replacing a dataset view\n+ [[SPARK-38236]](https://issues.apache.org/jira/browse/SPARK-38236)[SQL] Absolute file paths specified in create/alter table are treated as relative\n+ [[SPARK-35937]](https://issues.apache.org/jira/browse/SPARK-35937)[SQL] Extracting date field from timestamp should work in ANSI mode\n+ [[SPARK-34069]](https://issues.apache.org/jira/browse/SPARK-34069)[SQL] Kill barrier tasks should respect `SPARK_JOB_INTERRUPT_ON_CANCEL`\n+ [[SPARK-37707]](https://issues.apache.org/jira/browse/SPARK-37707)[SQL] Allow store assignment between TimestampNTZ and Date/Timestamp\n* February 23, 2022 \n+ [[SPARK-37577]](https://issues.apache.org/jira/browse/SPARK-37577)[SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning\n* February 8, 2022 \n+ [[SPARK-27442]](https://issues.apache.org/jira/browse/SPARK-27442)[SQL] Remove check field name when reading/writing data in parquet.\n+ Operating system security updates.\n* February 1, 2022 \n+ Operating system security updates.\n* January 26, 2022 \n+ Fixed a bug where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.\n+ Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.\n* January 19, 2022 \n+ Introduced support for inlining temporary credentials to COPY INTO for loading the source data without requiring SQL ANY\\_FILE permissions\n+ Bug fixes and security enhancements.\n* December 20, 2021 \n+ Fixed a rare bug with Parquet column index based filtering. \n### [Databricks Runtime 10.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id16) \nSee [Databricks Runtime 10.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/10.1.html). \n* June 15, 2022 \n+ [[SPARK-39283]](https://issues.apache.org/jira/browse/SPARK-39283)[CORE] Fix deadlock between TaskMemoryManager and UnsafeExternalSorter.SpillableIterator\n+ [[SPARK-39285]](https://issues.apache.org/jira/browse/SPARK-39285)[SQL] Spark should not check field names when reading files\n+ [[SPARK-34096]](https://issues.apache.org/jira/browse/SPARK-34096)[SQL] Improve performance for nth\\_value ignore nulls over offset window\n* June 2, 2022 \n+ Operating system security updates.\n* May 18, 2022 \n+ Fixes a potential native memory leak in Auto Loader.\n+ [[SPARK-39084]](https://issues.apache.org/jira/browse/SPARK-39084)[PYSPARK] Fix df.rdd.isEmpty() by using TaskContext to stop iterator on task completion\n+ [[SPARK-38889]](https://issues.apache.org/jira/browse/SPARK-38889)[SQL] Compile boolean column filters to use the bit type for MSSQL data source\n+ Operating system security updates.\n* April 19, 2022 \n+ [[SPARK-37270]](https://issues.apache.org/jira/browse/SPARK-37270)[SQL] Fix push foldable into CaseWhen branches if elseValue is empty\n+ Operating system security updates.\n* April 6, 2022 \n+ [[SPARK-38631]](https://issues.apache.org/jira/browse/SPARK-38631)[CORE] Uses Java-based implementation for un-tarring at Utils.unpack\n+ Operating system security updates.\n* March 22, 2022 \n+ [[SPARK-38437]](https://issues.apache.org/jira/browse/SPARK-38437)[SQL] Lenient serialization of datetime from datasource\n+ [[SPARK-38180]](https://issues.apache.org/jira/browse/SPARK-38180)[SQL] Allow safe up-cast expressions in correlated equality predicates\n+ [[SPARK-38155]](https://issues.apache.org/jira/browse/SPARK-38155)[SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates\n+ [[SPARK-38325]](https://issues.apache.org/jira/browse/SPARK-38325)[SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()\n* March 14, 2022 \n+ Improved transaction conflict detection for empty transactions in Delta Lake.\n+ [[SPARK-38185]](https://issues.apache.org/jira/browse/SPARK-38185)[SQL] Fix data incorrect if aggregate function is empty\n+ [[SPARK-38318]](https://issues.apache.org/jira/browse/SPARK-38318)[SQL] regression when replacing a dataset view\n+ [[SPARK-38236]](https://issues.apache.org/jira/browse/SPARK-38236)[SQL] Absolute file paths specified in create/alter table are treated as relative\n+ [[SPARK-35937]](https://issues.apache.org/jira/browse/SPARK-35937)[SQL] Extracting date field from timestamp should work in ANSI mode\n+ [[SPARK-34069]](https://issues.apache.org/jira/browse/SPARK-34069)[SQL] Kill barrier tasks should respect `SPARK_JOB_INTERRUPT_ON_CANCEL`\n+ [[SPARK-37707]](https://issues.apache.org/jira/browse/SPARK-37707)[SQL] Allow store assignment between TimestampNTZ and Date/Timestamp\n* February 23, 2022 \n+ [[SPARK-37577]](https://issues.apache.org/jira/browse/SPARK-37577)[SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning\n* February 8, 2022 \n+ [[SPARK-27442]](https://issues.apache.org/jira/browse/SPARK-27442)[SQL] Remove check field name when reading/writing data in parquet.\n+ Operating system security updates.\n* February 1, 2022 \n+ Operating system security updates.\n* January 26, 2022 \n+ Fixed a bug where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.\n+ Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.\n* January 19, 2022 \n+ Introduced support for inlining temporary credentials to COPY INTO for loading the source data without requiring SQL ANY\\_FILE permissions\n+ Fixed an out of memory issue with query result caching under certain conditions.\n+ Fixed an issue with `USE DATABASE` when a user switches the current catalog to a non-default catalog.\n+ Bug fixes and security enhancements.\n+ Operating system security updates.\n* December 20, 2021 \n+ Fixed a rare bug with Parquet column index based filtering. \n### [Databricks Runtime 10.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id17) \nSee [Databricks Runtime 10.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/10.0.html). \n* April 19, 2022 \n+ [[SPARK-37270]](https://issues.apache.org/jira/browse/SPARK-37270)[SQL] Fix push foldable into CaseWhen branches if elseValue is empty\n+ Operating system security updates.\n* April 6, 2022 \n+ [[SPARK-38631]](https://issues.apache.org/jira/browse/SPARK-38631)[CORE] Uses Java-based implementation for un-tarring at Utils.unpack\n+ Operating system security updates.\n* March 22, 2022 \n+ [[SPARK-38437]](https://issues.apache.org/jira/browse/SPARK-38437)[SQL] Lenient serialization of datetime from datasource\n+ [[SPARK-38180]](https://issues.apache.org/jira/browse/SPARK-38180)[SQL] Allow safe up-cast expressions in correlated equality predicates\n+ [[SPARK-38155]](https://issues.apache.org/jira/browse/SPARK-38155)[SQL] Disallow distinct aggregate in lateral subqueries with unsupported predicates\n+ [[SPARK-38325]](https://issues.apache.org/jira/browse/SPARK-38325)[SQL] ANSI mode: avoid potential runtime error in HashJoin.extractKeyExprAt()\n* March 14, 2022 \n+ Improved transaction conflict detection for empty transactions in Delta Lake.\n+ [[SPARK-38185]](https://issues.apache.org/jira/browse/SPARK-38185)[SQL] Fix data incorrect if aggregate function is empty\n+ [[SPARK-38318]](https://issues.apache.org/jira/browse/SPARK-38318)[SQL] regression when replacing a dataset view\n+ [[SPARK-38236]](https://issues.apache.org/jira/browse/SPARK-38236)[SQL] Absolute file paths specified in create/alter table are treated as relative\n+ [[SPARK-35937]](https://issues.apache.org/jira/browse/SPARK-35937)[SQL] Extracting date field from timestamp should work in ANSI mode\n+ [[SPARK-34069]](https://issues.apache.org/jira/browse/SPARK-34069)[SQL] Kill barrier tasks should respect `SPARK_JOB_INTERRUPT_ON_CANCEL`\n+ [[SPARK-37707]](https://issues.apache.org/jira/browse/SPARK-37707)[SQL] Allow store assignment between TimestampNTZ and Date/Timestamp\n* February 23, 2022 \n+ [[SPARK-37577]](https://issues.apache.org/jira/browse/SPARK-37577)[SQL] Fix ClassCastException: ArrayType cannot be cast to StructType for Generate Pruning\n* February 8, 2022 \n+ [[SPARK-27442]](https://issues.apache.org/jira/browse/SPARK-27442)[SQL] Remove check field name when reading/writing data in parquet.\n+ [[SPARK-36905]](https://issues.apache.org/jira/browse/SPARK-36905)[SQL] Fix reading hive views without explicit column names\n+ [[SPARK-37859]](https://issues.apache.org/jira/browse/SPARK-37859)[SQL] Fix issue that SQL tables created with JDBC with Spark 3.1 are not readable with 3.2\n+ Operating system security updates.\n* February 1, 2022 \n+ Operating system security updates.\n* January 26, 2022 \n+ Fixed a bug where concurrent transactions on Delta tables could commit in a non-serializable order under certain rare conditions.\n+ Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.\n* January 19, 2022 \n+ Bug fixes and security enhancements.\n+ Operating system security updates.\n* December 20, 2021 \n+ Fixed a rare bug with Parquet column index based filtering.\n* November 9, 2021 \n+ Introduced additional configuration flags to enable fine grained control of ANSI behaviors.\n* November 4, 2021 \n+ Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException\n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: No FileSystem for scheme` or that might cause modifications to `sparkContext.hadoopConfiguration` to not take effect in queries.\n+ The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.\n* November 30, 2021 \n+ Fixed an issue with timestamp parsing where a timezone string without a colon was considered invalid.\n+ Fixed an out of memory issue with query result caching under certain conditions.\n+ Fixed an issue with `USE DATABASE` when a user switches the current catalog to a non-default catalog. \n### [Databricks Runtime 9.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id18) \nSee [Databricks Runtime 9.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/9.0.html). \n* February 8, 2022 \n+ Operating system security updates.\n* February 1, 2022 \n+ Operating system security updates.\n* January 26, 2022 \n+ Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.\n* January 19, 2022 \n+ Bug fixes and security enhancements.\n+ Operating system security updates.\n* November 4, 2021 \n+ Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException\n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: No FileSystem for scheme` or that might cause modifications to `sparkContext.hadoopConfiguration` to not take effect in queries.\n+ The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.\n* September 22, 2021 \n+ Fixed a bug in cast Spark array with null to string\n* September 15, 2021 \n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x`.\n* September 8, 2021 \n+ Added support for schema name (`databaseName.schemaName.tableName` format) as the target table name for Azure Synapse Connector.\n+ Added geometry and geography JDBC types support for Spark SQL.\n+ [[SPARK-33527]](https://issues.apache.org/jira/browse/SPARK-33527)[SQL] Extended the function of decode to be consistent with mainstream databases.\n+ [[SPARK-36532]](https://issues.apache.org/jira/browse/SPARK-36532)[CORE][3.1] Fixed deadlock in `CoarseGrainedExecutorBackend.onDisconnected` to avoid `executorsconnected` to prevent executor shutdown hang.\n* August 25, 2021 \n+ SQL Server driver library was upgraded to 9.2.1.jre8.\n+ Snowflake connector was upgraded to 2.9.0.\n+ Fixed broken link to best trial notebook on AutoML experiment page. \n### [Databricks Runtime 8.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id19) \nSee [Databricks Runtime 8.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/8.4.html). \n* January 19, 2022 \n+ Operating system security updates.\n* November 4, 2021 \n+ Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException\n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: No FileSystem for scheme` or that might cause modifications to `sparkContext.hadoopConfiguration` to not take effect in queries.\n+ The Apache Spark Connector for Delta Sharing was upgraded to 0.2.0.\n* September 22, 2021 \n+ Spark JDBC driver was upgraded to 2.6.19.1030\n+ [[SPARK-36734]](https://issues.apache.org/jira/browse/SPARK-36734)[SQL] Upgrade ORC to 1.5.1\n* September 15, 2021 \n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x`.\n+ Operating system security updates.\n* September 8, 2021 \n+ [[SPARK-36532]](https://issues.apache.org/jira/browse/SPARK-36532)[CORE][3.1] Fixed deadlock in `CoarseGrainedExecutorBackend.onDisconnected` to avoid `executorsconnected` to prevent executor shutdown hang.\n* August 25, 2021 \n+ SQL Server driver library was upgraded to 9.2.1.jre8.\n+ Snowflake connector was upgraded to 2.9.0.\n+ Fixes a bug in credential passthrough caused by the new Parquet prefetch optimization, where user\u2019s passthrough credential might not be found during file access.\n* August 11, 2021 \n+ Fixes a RocksDB incompatibility problem that prevents older Databricks Runtime 8.4. This fixes forward compatibility for Auto Loader, `COPY INTO`, and stateful streaming applications.\n+ Fixes a bug in Auto Loader with S3 paths when using Auto Loader without a `path` option.\n+ Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.\n+ Fixes a bug when using Auto Loader to read CSV files with mismatching header files. If column names do not match, the column would be filled in with nulls. Now, if a schema is provided, it assumes the schema is the same and will only save column mismatches if rescued data columns are enabled.\n+ Adds a new option called `externalDataSource` into the Azure Synapse connector to remove the `CONTROL` permission requirement on the database for PolyBase reading.\n* July 29, 2021 \n+ [[SPARK-36034]](https://issues.apache.org/jira/browse/SPARK-36034)[BUILD] Rebase datetime in pushed down filters to Parquet\n+ [[SPARK-36163]](https://issues.apache.org/jira/browse/SPARK-36163)[BUILD] Propagate correct JDBC properties in JDBC connector provider and add `connectionProvider` option \n### [Databricks Runtime 8.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id20) \nSee [Databricks Runtime 8.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/8.3.html). \n* January 19, 2022 \n+ Operating system security updates.\n* November 4, 2021 \n+ Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException\n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: No FileSystem for scheme` or that might cause modifications to `sparkContext.hadoopConfiguration` to not take effect in queries.\n* September 22, 2021 \n+ Spark JDBC driver was upgraded to 2.6.19.1030\n* September 15, 2021 \n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x`.\n+ Operating system security updates.\n* September 8, 2021 \n+ [[SPARK-35700]](https://issues.apache.org/jira/browse/SPARK-35700)[SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.\n+ [[SPARK-36532]](https://issues.apache.org/jira/browse/SPARK-36532)[CORE][3.1] Fixed deadlock in `CoarseGrainedExecutorBackend.onDisconnected` to avoid `executorsconnected` to prevent executor shutdown hang.\n* August 25, 2021 \n+ SQL Server driver library was upgraded to 9.2.1.jre8.\n+ Snowflake connector was upgraded to 2.9.0.\n+ Fixes a bug in credential passthrough caused by the new Parquet prefetch optimization, where user\u2019s passthrough credential might not be found during file access.\n* August 11, 2021 \n+ Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.\n+ Fixes a bug when using Auto Loader to read CSV files with mismatching header files. If column names do not match, the column would be filled in with nulls. Now, if a schema is provided, it assumes the schema is the same and will only save column mismatches if rescued data columns are enabled.\n* July 29, 2021 \n+ Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1\n+ [[SPARK-36034]](https://issues.apache.org/jira/browse/SPARK-36034)[BUILD] Rebase datetime in pushed down filters to Parquet\n+ [[SPARK-36163]](https://issues.apache.org/jira/browse/SPARK-36163)[BUILD] Propagate correct JDBC properties in JDBC connector provider and add `connectionProvider` option\n* July 14, 2021 \n+ Fixed an issue when using column names with dots in Azure Synapse connector.\n+ Introduced `database.schema.table` format for Synapse Connector.\n+ Added support to provide `databaseName.schemaName.tableName` format as the target table instead of only `schemaName.tableName` or `tableName`.\n* June 15, 2021 \n+ Fixed a `NoSuchElementException` bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses\n+ Adds SQL `CREATE GROUP`, `DROP GROUP`, `ALTER GROUP`, `SHOW GROUPS`, and `SHOW USERS` commands. For details, see [Security statements](https://docs.databricks.com/sql/language-manual/index.html#security-statements) and [Show statements](https://docs.databricks.com/sql/language-manual/index.html#show-statements). \n### [Databricks Runtime 8.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id21) \nSee [Databricks Runtime 8.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/8.2.html). \n* September 22, 2021 \n+ Operating system security updates.\n* September 15, 2021 \n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x`.\n* September 8, 2021 \n+ [[SPARK-35700]](https://issues.apache.org/jira/browse/SPARK-35700)[SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.\n+ [[SPARK-36532]](https://issues.apache.org/jira/browse/SPARK-36532)[CORE][3.1] Fixed deadlock in `CoarseGrainedExecutorBackend.onDisconnected` to avoid `executorsconnected` to prevent executor shutdown hang.\n* August 25, 2021 \n+ Snowflake connector was upgraded to 2.9.0.\n* August 11, 2021 \n+ Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.\n+ [[SPARK-36034]](https://issues.apache.org/jira/browse/SPARK-36034)[SQL] Rebase datetime in pushed down filters to parquet.\n* July 29, 2021 \n+ Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1\n+ [[SPARK-36163]](https://issues.apache.org/jira/browse/SPARK-36163)[BUILD] Propagate correct JDBC properties in JDBC connector provider and add `connectionProvider` option\n* July 14, 2021 \n+ Fixed an issue when using column names with dots in Azure Synapse connector.\n+ Introduced `database.schema.table` format for Synapse Connector.\n+ Added support to provide `databaseName.schemaName.tableName` format as the target table instead of only `schemaName.tableName` or `tableName`.\n+ Fixed a bug that prevents users from time traveling to older available versions with Delta tables.\n* June 15, 2021 \n+ Fixes a `NoSuchElementException` bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses \n* June 7, 2021 \n+ Disable a list of pushdown predicates (StartsWith, EndsWith, Contains, Not(EqualTo()), and DataType) for AWS Glue Catalog since they are not supported in Glue yet. \n* May 26, 2021 \n+ Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).\n* April 30, 2021 \n+ Operating system security updates.\n+ [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit\n+ [[SPARK-34245]](https://issues.apache.org/jira/browse/SPARK-34245)[CORE] Ensure Master removes executors that failed to send finished state\n+ Fixed an OOM issue when Auto Loader reports Structured Streaming progress metrics. \n### [Databricks Runtime 8.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id22) \nSee [Databricks Runtime 8.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/8.1.html). \n* September 22, 2021 \n+ Operating system security updates.\n* September 15, 2021 \n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x`.\n* September 8, 2021 \n+ [[SPARK-35700]](https://issues.apache.org/jira/browse/SPARK-35700)[SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.\n+ [[SPARK-36532]](https://issues.apache.org/jira/browse/SPARK-36532)[CORE][3.1] Fixed deadlock in `CoarseGrainedExecutorBackend.onDisconnected` to avoid `executorsconnected` to prevent executor shutdown hang.\n* August 25, 2021 \n+ Snowflake connector was upgraded to 2.9.0.\n* August 11, 2021 \n+ Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.\n+ [[SPARK-36034]](https://issues.apache.org/jira/browse/SPARK-36034)[SQL] Rebase datetime in pushed down filters to parquet.\n* July 29, 2021 \n+ Upgrade Databricks Snowflake Spark connector to 2.9.0-spark-3.1\n+ [[SPARK-36163]](https://issues.apache.org/jira/browse/SPARK-36163)[BUILD] Propagate correct JDBC properties in JDBC connector provider and add `connectionProvider` option\n* July 14, 2021 \n+ Fixed an issue when using column names with dots in Azure Synapse connector.\n+ Fixed a bug that prevents users from time traveling to older available versions with Delta tables.\n* June 15, 2021 \n+ Fixes a `NoSuchElementException` bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses \n* June 7, 2021 \n+ Disable a list of pushdown predicates (StartsWith, EndsWith, Contains, Not(EqualTo()), and DataType) for AWS Glue Catalog since they are not supported in Glue yet. \n* May 26, 2021 \n+ Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).\n* April 30, 2021 \n+ Operating system security updates.\n+ [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit\n+ Fixed an OOM issue when Auto Loader reports Structured Streaming progress metrics.\n* April 27, 2021 \n+ [[SPARK-34245]](https://issues.apache.org/jira/browse/SPARK-34245)[CORE] Ensure Master removes executors that failed to send finished state\n+ [[SPARK-34856]](https://issues.apache.org/jira/browse/SPARK-34856)[SQL] ANSI mode: Allow casting complex types as string type\n+ [[SPARK-35014]](https://issues.apache.org/jira/browse/SPARK-35014) Fix the PhysicalAggregation pattern to not rewrite foldable expressions\n+ [[SPARK-34769]](https://issues.apache.org/jira/browse/SPARK-34769)[SQL] AnsiTypeCoercion: return narrowest convertible type among TypeCollection\n+ [[SPARK-34614]](https://issues.apache.org/jira/browse/SPARK-34614)[SQL] ANSI mode: Casting String to Boolean will throw exception on parse error\n+ [[SPARK-33794]](https://issues.apache.org/jira/browse/SPARK-33794)[SQL] ANSI mode: Fix NextDay expression to throw runtime IllegalArgumentException when receiving invalid input under \n### [Databricks Runtime 8.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id23) \nSee [Databricks Runtime 8.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/8.0.html). \n* September 15, 2021 \n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x`.\n* August 25, 2021 \n+ Snowflake connector was upgraded to 2.9.0.\n* August 11, 2021 \n+ Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.\n+ [[SPARK-36034]](https://issues.apache.org/jira/browse/SPARK-36034)[SQL] Rebase datetime in pushed down filters to parquet.\n* July 29, 2021 \n+ [[SPARK-36163]](https://issues.apache.org/jira/browse/SPARK-36163)[BUILD] Propagate correct JDBC properties in JDBC connector provider and add `connectionProvider` option\n* July 14, 2021 \n+ Fixed an issue when using column names with dots in Azure Synapse connector.\n+ Fixed a bug that prevents users from time traveling to older available versions with Delta tables. \n* June 7, 2021 \n+ Disable a list of pushdown predicates (StartsWith, EndsWith, Contains, Not(EqualTo()), and DataType) for AWS Glue Catalog since they are not supported in Glue yet. \n* May 26, 2021 \n+ Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).\n* April 30, 2021 \n+ Operating system security updates.\n+ [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit\n+ [[SPARK-34245]](https://issues.apache.org/jira/browse/SPARK-34245)[CORE] Ensure Master removes executors that failed to send finished state\n* March 24, 2021 \n+ [[SPARK-34681]](https://issues.apache.org/jira/browse/SPARK-34681)[SQL] Fix bug for full outer shuffled hash join when building left side with non-equal condition\n+ [[SPARK-34534]](https://issues.apache.org/jira/browse/SPARK-34534) Fix blockIds order when use FetchShuffleBlocks to fetch blocks\n+ [[SPARK-34613]](https://issues.apache.org/jira/browse/SPARK-34613)[SQL] Fix view does not capture disable hint config\n+ Disk caching is enabled by default on i3en instances.\n* March 9, 2021 \n+ [[SPARK-34543]](https://issues.apache.org/jira/browse/SPARK-34543)[SQL] Respect the `spark.sql.caseSensitive` config while resolving partition spec in v1 `SET LOCATION`\n+ [[SPARK-34392]](https://issues.apache.org/jira/browse/SPARK-34392)[SQL] Support ZoneOffset +h:mm in DateTimeUtils. getZoneId\n+ [UI] Fix the href link of Spark DAG Visualization\n+ [[SPARK-34436]](https://issues.apache.org/jira/browse/SPARK-34436)[SQL] DPP support LIKE ANY/ALL expression \n### [Databricks Runtime 7.6 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id24) \nSee [Databricks Runtime 7.6 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.6.html). \n* August 11, 2021 \n+ Fixes a bug that misconfigured AWS STS endpoints as Amazon Kinesis endpoints for the Kinesis source.\n+ [[SPARK-36034]](https://issues.apache.org/jira/browse/SPARK-36034)[SQL] Rebase datetime in pushed down filters to parquet.\n* July 29, 2021 \n+ [[SPARK-32998]](https://issues.apache.org/jira/browse/SPARK-32998)[BUILD] Add ability to override default remote repos with internal repos only\n* July 14, 2021 \n+ Fixed a bug that prevents users from time traveling to older available versions with Delta tables.\n* May 26, 2021 \n+ Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).\n* April 30, 2021 \n+ Operating system security updates.\n+ [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit\n+ [[SPARK-34245]](https://issues.apache.org/jira/browse/SPARK-34245)[CORE] Ensure Master removes executors that failed to send finished state\n* March 24, 2021 \n+ [[SPARK-34768]](https://issues.apache.org/jira/browse/SPARK-34768)[SQL] Respect the default input buffer size in Univocity\n+ [[SPARK-34534]](https://issues.apache.org/jira/browse/SPARK-34534) Fix blockIds order when use FetchShuffleBlocks to fetch blocks\n+ Disk caching is enabled by default on i3en instances.\n* March 9, 2021 \n+ (Azure only) Fixed an Auto Loader bug that can cause NullPointerException when using Databricks Runtime 7.6 to run an old Auto Loader stream created in Databricks Runtime 7.2\n+ [UI] Fix the href link of Spark DAG Visualization\n+ Unknown leaf-node SparkPlan is not handled correctly in SizeInBytesOnlyStatsSparkPlanVisitor\n+ Restore the output schema of `SHOW DATABASES`\n+ [Delta][8.0, 7.6] Fixed calculation bug in file size auto-tuning logic\n+ Disable staleness check for Delta table files in disk cache\n+ [SQL] Use correct dynamic pruning build key when range join hint is present\n+ Disable char type support in non-SQL code path\n+ Avoid NPE in DataFrameReader.schema\n+ Fix NPE when EventGridClient response has no entity\n+ Fix a read closed stream bug in Azure Auto Loader\n+ [SQL] Do not generate shuffle partition number advice when AOS is enabled\n* February 24, 2021 \n+ Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.\n+ Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file\u2019s decimal precision and scale are different from the Spark schema.\n+ Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.\n+ Introduced a new configuration `spark.databricks.hive.metastore.init.reloadFunctions.enabled`. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into `FunctionRegistry`. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.\n+ [[SPARK-34212]](https://issues.apache.org/jira/browse/SPARK-34212) Fixed issues related to reading decimal data from Parquet files.\n+ [[SPARK-34260]](https://issues.apache.org/jira/browse/SPARK-34260)[SQL] Fix UnresolvedException when creating temp view twice. \n### [Databricks Runtime 7.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id25) \nSee [Databricks Runtime 7.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.5.html). \n* May 26, 2021 \n+ Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).\n* April 30, 2021 \n+ Operating system security updates.\n+ [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit\n+ [[SPARK-34245]](https://issues.apache.org/jira/browse/SPARK-34245)[CORE] Ensure Master removes executors that failed to send finished state\n* March 24, 2021 \n+ [[SPARK-34768]](https://issues.apache.org/jira/browse/SPARK-34768)[SQL] Respect the default input buffer size in Univocity\n+ [[SPARK-34534]](https://issues.apache.org/jira/browse/SPARK-34534) Fix blockIds order when use FetchShuffleBlocks to fetch blocks\n+ Disk caching is enabled by default on i3en instances.\n* March 9, 2021 \n+ (Azure only) Fixed an Auto Loader bug that can cause NullPointerException when using Databricks Runtime 7.5 to run an old Auto Loader stream created in Databricks Runtime 7.2.\n+ [UI] Fix the href link of Spark DAG Visualization\n+ Unknown leaf-node SparkPlan is not handled correctly in SizeInBytesOnlyStatsSparkPlanVisitor\n+ Restore the output schema of `SHOW DATABASES`\n+ Disable staleness check for Delta table files in disk cache\n+ [SQL] Use correct dynamic pruning build key when range join hint is present\n+ Disable char type support in non-SQL code path\n+ Avoid NPE in DataFrameReader.schema\n+ Fix NPE when EventGridClient response has no entity\n+ Fix a read closed stream bug in Azure Auto Loader\n* February 24, 2021 \n+ Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.\n+ Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file\u2019s decimal precision and scale are different from the Spark schema.\n+ Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.\n+ Introduced a new configuration `spark.databricks.hive.metastore.init.reloadFunctions.enabled`. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into `FunctionRegistry`. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.\n+ [[SPARK-34212]](https://issues.apache.org/jira/browse/SPARK-34212) Fixed issues related to reading decimal data from Parquet files.\n+ [[SPARK-34260]](https://issues.apache.org/jira/browse/SPARK-34260)[SQL] Fix UnresolvedException when creating temp view twice.\n* February 4, 2021 \n+ Fixed a regression that prevents the incremental execution of a query that sets a global limit such as `SELECT * FROM table LIMIT nrows`. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled.\n+ Introduced write time checks to the Hive client to prevent the corruption of metadata in the Hive metastore for Delta tables.\n+ Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.\n* January 20, 2021 \n+ Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions: \n- These two DataFrames have common columns, but the output of the self join does not have common columns. For example, `df.join(df.select($\"col\" as \"new_col\"), cond)`\n- The derived DataFrame excludes some columns via select, groupBy, or window.\n- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, `df.join(df.drop(\"a\"), df(\"a\") === 1)`\n* January 12, 2021 \n+ Upgrade Azure Storage SDK from 2.3.8 to 2.3.9.\n+ [[SPARK-33593]](https://issues.apache.org/jira/browse/SPARK-33593)[SQL] Vector reader got incorrect data with binary partition value\n+ [[SPARK-33480]](https://issues.apache.org/jira/browse/SPARK-33480)[SQL] updates the error message of char/varchar table insertion length check \n### [Databricks Runtime 7.3 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id26) \nSee [Databricks Runtime 7.3 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts.html). \n* September 10, 2023 \n+ Miscellaneous bug fixes.\n* August 30, 2023 \n+ Operating system security updates.\n* August 15, 2023 \n+ Operating system security updates.\n* June 23, 2023 \n+ Snowflake-jdbc library is upgraded to 3.13.29 to address a security issue.\n+ Operating system security updates.\n* June 15, 2023 \n+ [[SPARK-43413]](https://issues.apache.org/jira/browse/SPARK-43413)[SQL] Fix `IN` subquery `ListQuery` nullability.\n+ Operating system security updates.\n* June 2, 2023 \n+ Fixed an issue in Auto Loader where different source file formats were inconsistent when the provided schema did not include inferred partitions. This issue could cause unexpected failures when reading files with missing columns in the inferred partition schema.\n* May 17, 2023 \n+ Operating system security updates.\n* April 25, 2023 \n+ Operating system security updates.\n* April 11, 2023 \n+ [[SPARK-42967]](https://issues.apache.org/jira/browse/SPARK-42967)[CORE] Fix SparkListenerTaskStart.stageAttemptId when a task is started after the stage is cancelled.\n+ Miscellaneous bug fixes.\n* March 29, 2023 \n+ Operating system security updates.\n* March 14, 2023 \n+ Miscellaneous bug fixes.\n* February 28, 2023 \n+ Operating system security updates.\n* February 16, 2023 \n+ Operating system security updates.\n* January 31, 2023 \n+ Table types of JDBC tables are now EXTERNAL by default.\n* January 18, 2023 \n+ Operating system security updates.\n* November 29, 2022 \n+ Miscellaneous bug fixes.\n* November 15, 2022 \n+ Upgraded Apache commons-text to 1.10.0.\n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* November 1, 2022 \n+ [[SPARK-38542]](https://issues.apache.org/jira/browse/SPARK-38542)[SQL] UnsafeHashedRelation should serialize numKeys out\n* October 18, 2022 \n+ Operating system security updates.\n* October 5, 2022 \n+ Miscellaneous bug fixes.\n+ Operating system security updates.\n* September 22, 2022 \n+ [[SPARK-40089]](https://issues.apache.org/jira/browse/SPARK-40089)[SQL] Fix sorting for some Decimal types\n* September 6, 2022 \n+ [[SPARK-35542]](https://issues.apache.org/jira/browse/SPARK-35542)[CORE][ML] Fix: Bucketizer created for multiple columns with parameters splitsArray, inputCols and outputCols can not be loaded after saving it\n+ [[SPARK-40079]](https://issues.apache.org/jira/browse/SPARK-40079)[CORE] Add Imputer inputCols validation for empty input case\n* August 24, 2022 \n+ [[SPARK-39962]](https://issues.apache.org/jira/browse/SPARK-39962)[PYTHON][SQL] Apply projection when group attributes are empty\n+ Operating system security updates.\n* August 9, 2022 \n+ Operating system security updates.\n* July 27, 2022 \n+ Make Delta MERGE operation results consistent when source is non-deterministic.\n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* July 13, 2022 \n+ [[SPARK-32680]](https://issues.apache.org/jira/browse/SPARK-32680)[SQL] Don\u2019t Preprocess V2 CTAS with Unresolved Query\n+ Disabled Auto Loader\u2019s use of native cloud APIs for directory listing on Azure.\n+ Operating system security updates.\n* July 5, 2022 \n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* June 2, 2022 \n+ [[SPARK-38918]](https://issues.apache.org/jira/browse/SPARK-38918)[SQL] Nested column pruning should filter out attributes that do not belong to the current relation\n+ Operating system security updates.\n* May 18, 2022 \n+ Upgrade AWS SDK version from 1.11.655 to 1.11.678.\n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* April 19, 2022 \n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* April 6, 2022 \n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* March 14, 2022 \n+ Remove vulnerable classes from log4j 1.2.17 jar\n+ Miscellaneous bug fixes.\n* February 23, 2022 \n+ [[SPARK-37859]](https://issues.apache.org/jira/browse/SPARK-37859)[SQL] Do not check for metadata during schema comparison\n* February 8, 2022 \n+ Upgrade Ubuntu JDK to 1.8.0.312.\n+ Operating system security updates.\n* February 1, 2022 \n+ Operating system security updates.\n* January 26, 2022 \n+ Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.\n* January 19, 2022 \n+ Conda defaults channel is removed from 7.3 ML LTS\n+ Operating system security updates.\n* December 7, 2021 \n+ Operating system security updates.\n* November 4, 2021 \n+ Fixed a bug that could cause Structured Streaming streams to fail with an ArrayIndexOutOfBoundsException\n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: No FileSystem for scheme` or that might cause modifications to `sparkContext.hadoopConfiguration` to not take effect in queries.\n* September 15, 2021 \n+ Fixed a race condition that might cause a query failure with an IOException like `java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_x_piecey of broadcast_x`.\n+ Operating system security updates.\n* September 8, 2021 \n+ [[SPARK-35700]](https://issues.apache.org/jira/browse/SPARK-35700)[SQL][WARMFIX] Read char/varchar orc table when created and written by external systems.\n+ [[SPARK-36532]](https://issues.apache.org/jira/browse/SPARK-36532)[CORE][3.1] Fixed deadlock in `CoarseGrainedExecutorBackend.onDisconnected` to avoid `executorsconnected` to prevent executor shutdown hang.\n* August 25, 2021 \n+ Snowflake connector was upgraded to 2.9.0.\n* July 29, 2021 \n+ [[SPARK-36034]](https://issues.apache.org/jira/browse/SPARK-36034)[BUILD] Rebase datetime in pushed down filters to Parquet\n+ [[SPARK-34508]](https://issues.apache.org/jira/browse/SPARK-34508)[BUILD] Skip `HiveExternalCatalogVersionsSuite` if network is down\n* July 14, 2021 \n+ Introduced `database.schema.table` format for Azure Synapse connector.\n+ Added support to provide `databaseName.schemaName.tableName` format as the target table instead of only `schemaName.tableName` or `tableName`.\n+ Fixed a bug that prevents users from time traveling to older available versions with Delta tables.\n* June 15, 2021 \n+ Fixes a `NoSuchElementException` bug in Delta Lake optimized writes that can happen when writing large amounts of data and encountering executor losses\n+ Updated Python with security patch to fix Python security vulnerability (CVE-2021-3177).\n* April 30, 2021 \n+ Operating system security updates.\n+ [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit\n+ [[SPARK-34245]](https://issues.apache.org/jira/browse/SPARK-34245)[CORE] Ensure Master removes executors that failed to send finished state\n+ [[SPARK-35045]](https://issues.apache.org/jira/browse/SPARK-35045)[SQL] Add an internal option to control input buffer in univocity\n* March 24, 2021 \n+ [[SPARK-34768]](https://issues.apache.org/jira/browse/SPARK-34768)[SQL] Respect the default input buffer size in Univocity\n+ [[SPARK-34534]](https://issues.apache.org/jira/browse/SPARK-34534) Fix blockIds order when use FetchShuffleBlocks to fetch blocks\n+ [[SPARK-33118]](https://issues.apache.org/jira/browse/SPARK-33118)[SQL]CREATE TEMPORARY TABLE fails with location\n+ Disk caching is enabled by default on i3en instances.\n* March 9, 2021 \n+ The updated Azure Blob File System driver for Azure Data Lake Storage Gen2 is now enabled by default. It brings multiple stability improvements.\n+ Fix path separator on Windows for `databricks-connect get-jar-dir`\n+ [UI] Fix the href link of Spark DAG Visualization\n+ [DBCONNECT] Add support for FlatMapCoGroupsInPandas in Databricks Connect 7.3\n+ Restore the output schema of `SHOW DATABASES`\n+ [SQL] Use correct dynamic pruning build key when range join hint is present\n+ Disable staleness check for Delta table files in disk cache\n+ [SQL] Do not generate shuffle partition number advice when AOS is enable\n* February 24, 2021 \n+ Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.\n+ Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file\u2019s decimal precision and scale are different from the Spark schema.\n+ Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.\n+ Introduced a new configuration `spark.databricks.hive.metastore.init.reloadFunctions.enabled`. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into `FunctionRegistry`. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.\n+ [[SPARK-34212]](https://issues.apache.org/jira/browse/SPARK-34212) Fixed issues related to reading decimal data from Parquet files.\n+ [[SPARK-33579]](https://issues.apache.org/jira/browse/SPARK-33579)[UI] Fix executor blank page behind proxy.\n+ [[SPARK-20044]](https://issues.apache.org/jira/browse/SPARK-20044)[UI] Support Spark UI behind front-end reverse proxy using a path prefix.\n+ [[SPARK-33277]](https://issues.apache.org/jira/browse/SPARK-33277)[PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.\n* February 4, 2021 \n+ Fixed a regression that prevents the incremental execution of a query that sets a global limit such as `SELECT * FROM table LIMIT nrows`. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled.\n+ Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.\n* January 20, 2021 \n+ Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions: \n- These two DataFrames have common columns, but the output of the self join does not have common columns. For example, `df.join(df.select($\"col\" as \"new_col\"), cond)`\n- The derived DataFrame excludes some columns via select, groupBy, or window.\n- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, `df.join(df.drop(\"a\"), df(\"a\") === 1)`\n* January 12, 2021 \n+ Operating system security updates.\n+ [[SPARK-33593]](https://issues.apache.org/jira/browse/SPARK-33593)[SQL] Vector reader got incorrect data with binary partition value\n+ [[SPARK-33677]](https://issues.apache.org/jira/browse/SPARK-33677)[SQL] Skip LikeSimplification rule if pattern contains any escapeChar\n+ [[SPARK-33592]](https://issues.apache.org/jira/browse/SPARK-33592)[ML][PYTHON] Pyspark ML Validator params in estimatorParamMaps may be lost after saving and reloading\n+ [[SPARK-33071]](https://issues.apache.org/jira/browse/SPARK-33071)[SPARK-33536][SQL] Avoid changing dataset\\_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin\n* December 8, 2020 \n+ [[SPARK-33587]](https://issues.apache.org/jira/browse/SPARK-33587)[CORE] Kill the executor on nested fatal errors\n+ [[SPARK-27421]](https://issues.apache.org/jira/browse/SPARK-27421)[SQL] Fix filter for int column and value class java.lang.String when pruning partition column\n+ [[SPARK-33316]](https://issues.apache.org/jira/browse/SPARK-33316)[SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing\n+ Spark Jobs launched using Databricks Connect could hang indefinitely with `Executor$TaskRunner.$anonfun$copySessionState` in executor stack trace\n+ Operating system security updates. \n* November 20, 2020 \n+ [[SPARK-33404]](https://issues.apache.org/jira/browse/SPARK-33404)[SQL][3.0] Fix incorrect results in `date_trunc` expression\n+ [[SPARK-33339]](https://issues.apache.org/jira/browse/SPARK-33339)[PYTHON] Pyspark application will hang due to non Exception error\n+ [[SPARK-33183]](https://issues.apache.org/jira/browse/SPARK-33183)[SQL][HOTFIX] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts\n+ [[SPARK-33371]](https://issues.apache.org/jira/browse/SPARK-33371)[PYTHON][3.0] Update setup.py and tests for Python 3.9\n+ [[SPARK-33391]](https://issues.apache.org/jira/browse/SPARK-33391)[SQL] element\\_at with CreateArray not respect one based index.\n+ [[SPARK-33306]](https://issues.apache.org/jira/browse/SPARK-33306)[SQL]Timezone is needed when cast date to string\n+ [[SPARK-33260]](https://issues.apache.org/jira/browse/SPARK-33260)[SQL] Fix incorrect results from SortExec when sortOrder is Stream \n* November 5, 2020 \n+ Fix ABFS and WASB locking with regard to `UserGroupInformation.getCurrentUser()`.\n+ Fix an infinite loop bug when Avro reader reads the MAGIC bytes.\n+ Add support for the [USAGE privilege](https://docs.databricks.com/data-governance/table-acls/object-privileges.html#usage-privilege).\n+ Performance improvements for privilege checking in [table access control](https://docs.databricks.com/data-governance/table-acls/index.html).\n* October 13, 2020 \n+ Operating system security updates.\n+ You can read and write from DBFS using the FUSE mount at /dbfs/ when on a high concurrency credential passthrough enabled cluster. Regular mounts are supported but mounts that need passthrough credentials are not supported yet.\n+ [[SPARK-32999]](https://issues.apache.org/jira/browse/SPARK-32999)[SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode\n+ [[SPARK-32585]](https://issues.apache.org/jira/browse/SPARK-32585)[SQL] Support scala enumeration in ScalaReflection\n+ Fixed listing directories in FUSE mount that contain file names with invalid XML characters\n+ FUSE mount no longer uses ListMultipartUploads\n* September 29, 2020 \n+ [[SPARK-32718]](https://issues.apache.org/jira/browse/SPARK-32718)[SQL] Remove unnecessary keywords for interval units\n+ [[SPARK-32635]](https://issues.apache.org/jira/browse/SPARK-32635)[SQL] Fix foldable propagation\n+ Add a new config `spark.shuffle.io.decoder.consolidateThreshold`. Set the config value to `Long.MAX_VALUE` to skip the consolidation of netty FrameBuffers, which prevents `java.lang.IndexOutOfBoundsException` in corner cases. \n* April 25, 2023 \n+ Operating system security updates.\n* April 11, 2023 \n+ Miscellaneous bug fixes.\n* March 29, 2023 \n+ Miscellaneous bug fixes.\n* March 14, 2023 \n+ Operating system security updates.\n* February 28, 2023 \n+ Operating system security updates.\n* February 16, 2023 \n+ Operating system security updates.\n* January 31, 2023 \n+ Miscellaneous bug fixes.\n* January 18, 2023 \n+ Operating system security updates.\n* November 29, 2022 \n+ Operating system security updates.\n* November 15, 2022 \n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* November 1, 2022 \n+ Operating system security updates.\n* October 18, 2022 \n+ Operating system security updates.\n+ October 5, 2022 \n- Operating system security updates.\n+ August 24, 2022 \n- Operating system security updates.\n+ August 9, 2022 \n- Operating system security updates.\n+ July 27, 2022 \n- Operating system security updates.\n+ July 5, 2022 \n- Operating system security updates.\n+ June 2, 2022 \n- Operating system security updates.\n+ May 18, 2022 \n- Operating system security updates.\n+ April 19, 2022 \n- Operating system security updates.\n- Miscellaneous bug fixes.\n+ April 6, 2022 \n- Operating system security updates.\n- Miscellaneous bug fixes.\n+ March 14, 2022 \n- Miscellaneous bug fixes.\n+ February 23, 2022 \n- Miscellaneous bug fixes.\n+ February 8, 2022 \n- Upgrade Ubuntu JDK to 1.8.0.312.\n- Operating system security updates.\n+ February 1, 2022 \n- Operating system security updates.\n+ January 19, 2022 \n- Operating system security updates.\n+ September 22, 2021 \n- Operating system security updates.\n+ April 30, 2021 \n- Operating system security updates.\n- [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit\n+ January 12, 2021 \n- Operating system security updates.\n+ December 8, 2020 \n- [[SPARK-27421]](https://issues.apache.org/jira/browse/SPARK-27421)[SQL] Fix filter for int column and value class java.lang.String when pruning partition column\n- Operating system security updates.\n+ November 20, 2020\n+ November 3, 2020 \n- Upgraded Java version from 1.8.0\\_252 to 1.8.0\\_265.\n- Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()\n+ October 13, 2020 \n- Operating system security updates. \n### [Databricks Runtime 6.4 Extended Support (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id27) \nSee [Databricks Runtime 6.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/6.4.html) and [Databricks Runtime 6.4 Extended Support (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/6.4x.html). \n* July 5, 2022 \n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* June 2, 2022 \n+ Operating system security updates.\n* May 18, 2022 \n+ Operating system security updates.\n* April 19, 2022 \n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* April 6, 2022 \n+ Operating system security updates.\n+ Miscellaneous bug fixes.\n* March 14, 2022 \n+ Remove vulnerable classes from log4j 1.2.17 jar\n+ Miscellaneous bug fixes.\n* February 23, 2022 \n+ Miscellaneous bug fixes.\n* February 8, 2022 \n+ Upgrade Ubuntu JDK to 1.8.0.312.\n+ Operating system security updates.\n* February 1, 2022 \n+ Operating system security updates.\n* January 26, 2022 \n+ Fixed a bug where the OPTIMIZE command could fail when the ANSI SQL dialect was enabled.\n* January 19, 2022 \n+ Operating system security updates.\n* December 8, 2021 \n+ Operating system security updates.\n* September 22, 2021 \n+ Operating system security updates.\n* June 15, 2021 \n+ [[SPARK-35576]](https://issues.apache.org/jira/browse/SPARK-35576)[SQL] Redact the sensitive info in the result of Set command\n* June 7, 2021 \n+ Add a new config called `spark.sql.maven.additionalRemoteRepositories`, a comma-delimited string config of the optional additional remote maven mirror. The value defaults to `https://maven-central.storage-download.googleapis.com/maven2/`.\n* April 30, 2021 \n+ Operating system security updates.\n+ [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit \n* April 6, 2021 \n+ Fixed retries added to the S3 client to resolve connection reset issues.\n* March 24, 2021 \n+ Disk caching is enabled by default on i3en instances. \n* March 9, 2021 \n+ Port HADOOP-17215 to the Azure Blob File System driver (Support for conditional overwrite).\n+ Fix path separator on Windows for `databricks-connect get-jar-dir`\n+ Added support for Hive metastore versions 2.3.5, 2.3.6, and 2.3.7\n+ Arrow \u201ctotalResultsCollected\u201d reported incorrectly after spill\n* February 24, 2021 \n+ Introduced a new configuration `spark.databricks.hive.metastore.init.reloadFunctions.enabled`. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into `FunctionRegistry`. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.\n* February 4, 2021 \n+ Fixed a regression that prevents the incremental execution of a query that sets a global limit such as `SELECT * FROM table LIMIT nrows`. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled.\n+ Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.\n* January 12, 2021 \n+ Operating system security updates.\n* December 8, 2020 \n+ [[SPARK-27421]](https://issues.apache.org/jira/browse/SPARK-27421)[SQL] Fix filter for int column and value class java.lang.String when pruning partition column\n+ [[SPARK-33183]](https://issues.apache.org/jira/browse/SPARK-33183)[SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts\n+ [Runtime 6.4 ML GPU] We previously installed an incorrect version (2.7.8-1+cuda11.1) of NCCL. This release corrects it to 2.4.8-1+cuda10.0 that is compatible with CUDA 10.0.\n+ Operating system security updates. \n* November 20, 2020 \n+ [[SPARK-33260]](https://issues.apache.org/jira/browse/SPARK-33260)[SQL] Fix incorrect results from SortExec when sortOrder is Stream\n+ [[SPARK-32635]](https://issues.apache.org/jira/browse/SPARK-32635)[SQL] Fix foldable propagation \n* November 3, 2020 \n+ Upgraded Java version from 1.8.0\\_252 to 1.8.0\\_265.\n+ Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()\n+ Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.\n* October 13, 2020 \n+ Operating system security updates.\n+ [[SPARK-32999]](https://issues.apache.org/jira/browse/SPARK-32999)[SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode\n+ Fixed listing directories in FUSE mount that contain file names with invalid XML characters\n+ FUSE mount no longer uses ListMultipartUploads\n* September 24, 2020 \n+ Fixed a previous limitation where passthrough on standard cluster would still restrict the filesystem implementation user uses. Now users would be able to access local filesystems without restrictions.\n+ Operating system security updates.\n* September 8, 2020 \n+ A new parameter was created for Azure Synapse Analytics, `maxbinlength`. This parameter is used to control the column length of BinaryType columns, and is translated as `VARBINARY(maxbinlength)`. It can be set using `.option(\"maxbinlength\", n)`, where 0 < n <= 8000.\n+ Update Azure Storage SDK to 8.6.4 and enable TCP keep alive on connections made by the WASB driver\n* August 25, 2020 \n+ Fixed ambiguous attribute resolution in self-merge\n* August 18, 2020 \n+ [[SPARK-32431]](https://issues.apache.org/jira/browse/SPARK-32431)[SQL] Check duplicate nested columns in read from in-built datasources\n+ Fixed a race condition in the SQS connector when using Trigger.Once.\n* August 11, 2020 \n+ [[SPARK-28676]](https://issues.apache.org/jira/browse/SPARK-28676)[CORE] Avoid Excessive logging from ContextCleaner\n* August 3, 2020 \n+ You can now use the LDA transform function on a passthrough-enabled cluster.\n+ Operating system security updates.\n* July 7, 2020 \n+ Upgraded Java version from 1.8.0\\_232 to 1.8.0\\_252.\n* April 21, 2020 \n+ [[SPARK-31312]](https://issues.apache.org/jira/browse/SPARK-31312)[SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper\n* April 7, 2020 \n+ To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (`ARROW_PRE_0_15_IPC_FORMAT=1`) to enable support for those versions of PyArrow. See the instructions in [[SPARK-29367]](https://issues.apache.org/jira/browse/SPARK-29367).\n* March 10, 2020 \n+ Optimized autoscaling is now used by default on interactive clusters on the Security plan.\n+ The Snowflake connector (`spark-snowflake_2.11`) included in Databricks Runtime is updated to version 2.5.9. `snowflake-jdbc` is updated to version 3.12.0. \n### [Databricks Runtime 5.5 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id28) \nSee [Databricks Runtime 5.5 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/5.5.html) and [Databricks Runtime 5.5 Extended Support (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/5.5x.html). \n* December 8, 2021 \n+ Operating system security updates.\n* September 22, 2021 \n+ Operating system security updates.\n* August 25, 2021 \n+ Downgraded some previously upgraded python packages in 5.5 ML Extended Support Release to maintain better parity with 5.5 ML LTS (now deprecated). See [\\_]/release-notes/runtime/5.5xml.md) for the updated differences between the two versions.\n* June 15, 2021 \n+ [[SPARK-35576]](https://issues.apache.org/jira/browse/SPARK-35576)[SQL] Redact the sensitive info in the result of Set command\n* June 7, 2021 \n+ Add a new config called `spark.sql.maven.additionalRemoteRepositories`, a comma-delimited string config of the optional additional remote maven mirror. The value defaults to `https://maven-central.storage-download.googleapis.com/maven2/`.\n* April 30, 2021 \n+ Operating system security updates.\n+ [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit \n* April 6, 2021 \n+ Fixed retries added to the S3 client to resolve connection reset issues.\n* March 24, 2021 \n+ Disk caching is enabled by default on i3en instances. \n* March 9, 2021 \n+ Port HADOOP-17215 to the Azure Blob File System driver (Support for conditional overwrite).\n* February 24, 2021 \n+ Introduced a new configuration `spark.databricks.hive.metastore.init.reloadFunctions.enabled`. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into `FunctionRegistry`. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.\n* January 12, 2021 \n+ Operating system security updates.\n+ Fix for [[HADOOP-17130]](https://issues.apache.org/jira/browse/HADOOP-17130).\n* December 8, 2020 \n+ [[SPARK-27421]](https://issues.apache.org/jira/browse/SPARK-27421)[SQL] Fix filter for int column and value class java.lang.String when pruning partition column\n+ Operating system security updates. \n* November 20, 2020 \n+ [[SPARK-33260]](https://issues.apache.org/jira/browse/SPARK-33260)[SQL] Fix incorrect results from SortExec when sortOrder is Stream\n+ [[SPARK-32635]](https://issues.apache.org/jira/browse/SPARK-32635)[SQL] Fix foldable propagation \n* October 29, 2020 \n+ Upgraded Java version from 1.8.0\\_252 to 1.8.0\\_265.\n+ Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()\n+ Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.\n* October 13, 2020 \n+ Operating system security updates.\n+ [[SPARK-32999]](https://issues.apache.org/jira/browse/SPARK-32999)[SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode\n* September 24, 2020 \n+ Operating system security updates.\n* September 8, 2020 \n+ A new parameter was created for Azure Synapse Analytics, `maxbinlength`. This parameter is used to control the column length of BinaryType columns, and is translated as `VARBINARY(maxbinlength)`. It can be set using `.option(\"maxbinlength\", n)`, where 0 < n <= 8000.\n* August 18, 2020 \n+ [[SPARK-32431]](https://issues.apache.org/jira/browse/SPARK-32431)[SQL] Check duplicate nested columns in read from in-built datasources\n+ Fixed a race condition in the SQS connector when using Trigger.Once.\n* August 11, 2020 \n+ [[SPARK-28676]](https://issues.apache.org/jira/browse/SPARK-28676)[CORE] Avoid Excessive logging from ContextCleaner\n* August 3, 2020 \n+ Operating system security updates\n* July 7, 2020 \n+ Upgraded Java version from 1.8.0\\_232 to 1.8.0\\_252.\n* April 21, 2020 \n+ [[SPARK-31312]](https://issues.apache.org/jira/browse/SPARK-31312)[SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper\n* April 7, 2020 \n+ To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (`ARROW_PRE_0_15_IPC_FORMAT=1`) to enable support for those versions of PyArrow. See the instructions in [[SPARK-29367]](https://issues.apache.org/jira/browse/SPARK-29367).\n* March 25, 2020 \n+ The Snowflake connector (`spark-snowflake_2.11`) included in Databricks Runtime is updated to version 2.5.9. `snowflake-jdbc` is updated to version 3.12.0.\n* March 10, 2020 \n+ Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the `spark.databricks.driver.disableScalaOutput` Spark configuration to `true`. By default the flag value is `false`. The flag controls cell output for Scala JAR jobs and Scala notebooks. If the flag is enabled, Spark does not return job execution results to the client. The flag does not affect the data that is written in the cluster\u2019s log files. Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.\n* February 18, 2020 \n+ [[SPARK-24783]](https://issues.apache.org/jira/browse/SPARK-24783)[SQL] spark.sql.shuffle.partitions=0 should throw exception\n+ Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.\n* January 28, 2020 \n+ Fixed a bug in S3AFileSystem, whereby `fs.isDirectory(path)` or `fs.getFileStatus(path).isDirectory()` could sometimes incorrectly return `false`. The bug would manifest on paths for which `aws s3 list-objects-v2 --prefix path/ --max-keys 1 --delimiter /` responds with no keys or common prefixes, but `isTruncated = true`. This might happen for directories under which many objects were deleted and versioning was enabled.\n+ [[SPARK-30447]](https://issues.apache.org/jira/browse/SPARK-30447)[SQL] Constant propagation nullability issue.\n* January 14, 2020 \n+ Upgraded Java version from 1.8.0\\_222 to 1.8.0\\_232.\n* November 19, 2019 \n+ [[SPARK-29743]](https://issues.apache.org/jira/browse/SPARK-29743) [SQL] sample should set needCopyResult to true if its child\u2019s needCopyResult is true\n+ R version was unintendedly upgraded to 3.6.1 from 3.6.0. We downgraded it back to 3.6.0.\n* November 5, 2019 \n+ Upgraded Java version from 1.8.0\\_212 to 1.8.0\\_222.\n* October 23, 2019 \n+ [[SPARK-29244]](https://issues.apache.org/jira/browse/SPARK-29244)[CORE] Prevent freed page in BytesToBytesMap free again\n* October 8, 2019 \n+ Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires [Simba Apache Spark ODBC driver version 2.6.10](https://databricks.com/spark/odbc-driver-download)).\n+ Fixed an issue affecting using `Optimize` command with table ACL enabled clusters.\n+ Fixed an issue where `pyspark.ml` libraries would fail due to Scala UDF forbidden error on table ACL enabled clusters.\n+ Fixed NullPointerException when checking error code in the WASB client.\n* September 24, 2019 \n+ Improved stability of Parquet writer.\n+ Fixed the problem that Thrift query cancelled before it starts executing may stuck in STARTED state.\n* September 10, 2019 \n+ Add thread safe iterator to BytesToBytesMap\n+ [[SPARK-27992]](https://issues.apache.org/jira/browse/SPARK-27992)[[SPARK-28881]](https://issues.apache.org/jira/browse/SPARK-28881)Allow Python to join with connection thread to propagate errors\n+ Fixed a bug affecting certain global aggregation queries.\n+ Improved credential redaction.\n+ [[SPARK-27330]](https://issues.apache.org/jira/browse/SPARK-27330)[SS] support task abort in foreach writer\n+ [[SPARK-28642]](https://issues.apache.org/jira/browse/SPARK-28642)Hide credentials in SHOW CREATE TABLE\n+ [[SPARK-28699]](https://issues.apache.org/jira/browse/SPARK-28699)[SQL] Disable using radix sort for ShuffleExchangeExec in repartition case\n* August 27, 2019 \n+ [[SPARK-20906]](https://issues.apache.org/jira/browse/SPARK-20906)[SQL]Allow user-specified schema in the API *to*avro\\_ with schema registry\n+ [[SPARK-27838]](https://issues.apache.org/jira/browse/SPARK-27838)[SQL] Support user provided non-nullable avro schema for nullable catalyst schema without any null record\n+ Improvement on Delta Lake time travel\n+ Fixed an issue affecting certain `transform` expression\n+ Supports broadcast variables when Process Isolation is enabled\n* August 13, 2019 \n+ Delta streaming source should check the latest protocol of a table\n+ [[SPARK-28260]](https://issues.apache.org/jira/browse/SPARK-28260)Add CLOSED state to ExecutionState\n+ [[SPARK-28489]](https://issues.apache.org/jira/browse/SPARK-28489)[SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets\n* July 30, 2019 \n+ [[SPARK-28015]](https://issues.apache.org/jira/browse/SPARK-28015)[SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats\n+ [[SPARK-28308]](https://issues.apache.org/jira/browse/SPARK-28308)[CORE] CalendarInterval sub-second part should be padded before parsing\n+ [[SPARK-27485]](https://issues.apache.org/jira/browse/SPARK-27485)EnsureRequirements.reorder should handle duplicate expressions gracefully\n+ [[SPARK-28355]](https://issues.apache.org/jira/browse/SPARK-28355)[CORE][PYTHON] Use Spark conf for threshold at which UDF is compressed by broadcast \n### [Databricks Light 2.4 Extended Support](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id29) \nSee [Databricks Light 2.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/2.4light.html) and [Databricks Light 2.4 Extended Support (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/2.4lightx.html). \n### [Databricks Runtime 7.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id30) \nSee [Databricks Runtime 7.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.4.html). \n* April 30, 2021 \n+ Operating system security updates.\n+ [[SPARK-35227]](https://issues.apache.org/jira/browse/SPARK-35227)[BUILD] Update the resolver for spark-packages in SparkSubmit\n+ [[SPARK-34245]](https://issues.apache.org/jira/browse/SPARK-34245)[CORE] Ensure Master removes executors that failed to send finished state\n+ [[SPARK-35045]](https://issues.apache.org/jira/browse/SPARK-35045)[SQL] Add an internal option to control input buffer in univocity and a configuration for CSV input buffer size\n* March 24, 2021 \n+ [[SPARK-34768]](https://issues.apache.org/jira/browse/SPARK-34768)[SQL] Respect the default input buffer size in Univocity\n+ [[SPARK-34534]](https://issues.apache.org/jira/browse/SPARK-34534) Fix blockIds order when use FetchShuffleBlocks to fetch blocks\n+ Disk caching is enabled by default on i3en instances.\n* March 9, 2021 \n+ The updated Azure Blob File System driver for Azure Data Lake Storage Gen2 is now enabled by default. It brings multiple stability improvements.\n+ [ES-67926][UI] Fix the href link of Spark DAG Visualization\n+ [ES-65064] Restore the output schema of `SHOW DATABASES`\n+ [SC-70522][SQL] Use correct dynamic pruning build key when range join hint is present\n+ [SC-35081] Disable staleness check for Delta table files in disk cache\n+ [SC-70640] Fix NPE when EventGridClient response has no entity\n+ [SC-70220][SQL] Do not generate shuffle partition number advice when AOS is enabled\n* February 24, 2021 \n+ Upgraded the Spark BigQuery connector to v0.18, which introduces various bug fixes and support for Arrow and Avro iterators.\n+ Fixed a correctness issue that caused Spark to return incorrect results when the Parquet file\u2019s decimal precision and scale are different from the Spark schema.\n+ Fixed reading failure issue on Microsoft SQL Server tables that contain spatial data types, by adding geometry and geography JDBC types support for Spark SQL.\n+ Introduced a new configuration `spark.databricks.hive.metastore.init.reloadFunctions.enabled`. This configuration controls the built in Hive initialization. When set to true, Databricks reloads all functions from all databases that users have into `FunctionRegistry`. This is the default behavior in Hive Metastore. When set to false, Databricks disables this process for optimization.\n+ [[SPARK-34212]](https://issues.apache.org/jira/browse/SPARK-34212) Fixed issues related to reading decimal data from Parquet files.\n+ [[SPARK-33579]](https://issues.apache.org/jira/browse/SPARK-33579)[UI] Fix executor blank page behind proxy.\n+ [[SPARK-20044]](https://issues.apache.org/jira/browse/SPARK-20044)[UI] Support Spark UI behind front-end reverse proxy using a path prefix.\n+ [[SPARK-33277]](https://issues.apache.org/jira/browse/SPARK-33277)[PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.\n* February 4, 2021 \n+ Fixed a regression that prevents the incremental execution of a query that sets a global limit such as `SELECT * FROM table LIMIT nrows`. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled.\n+ Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.\n* January 20, 2021 \n+ Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions: \n- These two DataFrames have common columns, but the output of the self join does not have common columns. For example, `df.join(df.select($\"col\" as \"new_col\"), cond)`\n- The derived DataFrame excludes some columns via select, groupBy, or window.\n- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, `df.join(df.drop(\"a\"), df(\"a\") === 1)`\n* January 12, 2021 \n+ Operating system security updates.\n+ [[SPARK-33593]](https://issues.apache.org/jira/browse/SPARK-33593)[SQL] Vector reader got incorrect data with binary partition value\n+ [[SPARK-33677]](https://issues.apache.org/jira/browse/SPARK-33677)[SQL] Skip LikeSimplification rule if pattern contains any escapeChar\n+ [[SPARK-33071]](https://issues.apache.org/jira/browse/SPARK-33071)[SPARK-33536][SQL] Avoid changing dataset\\_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin\n* December 8, 2020 \n+ [[SPARK-33587]](https://issues.apache.org/jira/browse/SPARK-33587)[CORE] Kill the executor on nested fatal errors\n+ [[SPARK-27421]](https://issues.apache.org/jira/browse/SPARK-27421)[SQL] Fix filter for int column and value class java.lang.String when pruning partition column\n+ [[SPARK-33316]](https://issues.apache.org/jira/browse/SPARK-33316)[SQL] Support user provided nullable Avro schema for non-nullable catalyst schema in Avro writing\n+ Operating system security updates. \n* November 20, 2020 \n+ [[SPARK-33404]](https://issues.apache.org/jira/browse/SPARK-33404)[SQL][3.0] Fix incorrect results in `date_trunc` expression\n+ [[SPARK-33339]](https://issues.apache.org/jira/browse/SPARK-33339)[PYTHON] Pyspark application will hang due to non Exception error\n+ [[SPARK-33183]](https://issues.apache.org/jira/browse/SPARK-33183)[SQL][HOTFIX] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts\n+ [[SPARK-33371]](https://issues.apache.org/jira/browse/SPARK-33371)[PYTHON][3.0] Update setup.py and tests for Python 3.9\n+ [[SPARK-33391]](https://issues.apache.org/jira/browse/SPARK-33391)[SQL] element\\_at with CreateArray not respect one based index.\n+ [[SPARK-33306]](https://issues.apache.org/jira/browse/SPARK-33306)[SQL]Timezone is needed when cast date to string\n+ [[SPARK-33260]](https://issues.apache.org/jira/browse/SPARK-33260)[SQL] Fix incorrect results from SortExec when sortOrder is Stream\n+ [[SPARK-33272]](https://issues.apache.org/jira/browse/SPARK-33272)[SQL] prune the attributes mapping in QueryPlan.transformUpWithNewOutput \n### [Databricks Runtime 7.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id31) \nSee [Databricks Runtime 7.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.2.html). \n* February 4, 2021 \n+ Fixed a regression that prevents the incremental execution of a query that sets a global limit such as `SELECT * FROM table LIMIT nrows`. The regression was experienced by users running queries via ODBC/JDBC with Arrow serialization enabled.\n+ Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.\n* January 20, 2021 \n+ Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions: \n- These two DataFrames have common columns, but the output of the self join does not have common columns. For example, `df.join(df.select($\"col\" as \"new_col\"), cond)`\n- The derived DataFrame excludes some columns via select, groupBy, or window.\n- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, `df.join(df.drop(\"a\"), df(\"a\") === 1)`\n* January 12, 2021 \n+ Operating system security updates.\n+ [[SPARK-33593]](https://issues.apache.org/jira/browse/SPARK-33593)[SQL] Vector reader got incorrect data with binary partition value\n+ [[SPARK-33677]](https://issues.apache.org/jira/browse/SPARK-33677)[SQL] Skip LikeSimplification rule if pattern contains any escapeChar\n+ [[SPARK-33071]](https://issues.apache.org/jira/browse/SPARK-33071)[SPARK-33536][SQL] Avoid changing dataset\\_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin\n* December 8, 2020 \n+ [[SPARK-27421]](https://issues.apache.org/jira/browse/SPARK-27421)[SQL] Fix filter for int column and value class java.lang.String when pruning partition column\n+ [[SPARK-33404]](https://issues.apache.org/jira/browse/SPARK-33404)[SQL] Fix incorrect results in `date_trunc` expression\n+ [[SPARK-33339]](https://issues.apache.org/jira/browse/SPARK-33339)[PYTHON] Pyspark application will hang due to non Exception error\n+ [[SPARK-33183]](https://issues.apache.org/jira/browse/SPARK-33183)[SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts\n+ [[SPARK-33391]](https://issues.apache.org/jira/browse/SPARK-33391)[SQL] element\\_at with CreateArray not respect one based index.\n+ Operating system security updates. \n* November 20, 2020 \n+ [[SPARK-33306]](https://issues.apache.org/jira/browse/SPARK-33306)[SQL]Timezone is needed when cast date to string\n+ [[SPARK-33260]](https://issues.apache.org/jira/browse/SPARK-33260)[SQL] Fix incorrect results from SortExec when sortOrder is Stream \n* November 3, 2020 \n+ Upgraded Java version from 1.8.0\\_252 to 1.8.0\\_265.\n+ Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()\n+ Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.\n* October 13, 2020 \n+ Operating system security updates.\n+ [[SPARK-32999]](https://issues.apache.org/jira/browse/SPARK-32999)[SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode\n+ Fixed listing directories in FUSE mount that contain file names with invalid XML characters\n+ FUSE mount no longer uses ListMultipartUploads\n* September 29, 2020 \n+ [[SPARK-28863]](https://issues.apache.org/jira/browse/SPARK-28863)[SQL][WARMFIX] Introduce AlreadyOptimized to prevent reanalysis of V1FallbackWriters\n+ [[SPARK-32635]](https://issues.apache.org/jira/browse/SPARK-32635)[SQL] Fix foldable propagation\n+ Add a new config `spark.shuffle.io.decoder.consolidateThreshold`. Set the config value to `Long.MAX_VALUE` to skip the consolidation of netty FrameBuffers, which prevents `java.lang.IndexOutOfBoundsException` in corner cases.\n* September 24, 2020 \n+ [[SPARK-32764]](https://issues.apache.org/jira/browse/SPARK-32764)[SQL] -0.0 should be equal to 0.0\n+ [[SPARK-32753]](https://issues.apache.org/jira/browse/SPARK-32753)[SQL] Only copy tags to node with no tags when transforming plans\n+ [[SPARK-32659]](https://issues.apache.org/jira/browse/SPARK-32659)[SQL] Fix the data issue of inserted Dynamic Partition Pruning on non-atomic type\n+ Operating system security updates.\n* September 8, 2020 \n+ A new parameter was created for Azure Synapse Analytics, `maxbinlength`. This parameter is used to control the column length of BinaryType columns, and is translated as `VARBINARY(maxbinlength)`. It can be set using `.option(\"maxbinlength\", n)`, where 0 < n <= 8000. \n### [Databricks Runtime 7.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id32) \nSee [Databricks Runtime 7.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.1.html). \n* February 4, 2021 \n+ Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.\n* January 20, 2021 \n+ Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions: \n- These two DataFrames have common columns, but the output of the self join does not have common columns. For example, `df.join(df.select($\"col\" as \"new_col\"), cond)`\n- The derived DataFrame excludes some columns via select, groupBy, or window.\n- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, `df.join(df.drop(\"a\"), df(\"a\") === 1)`\n* January 12, 2021 \n+ Operating system security updates.\n+ [[SPARK-33593]](https://issues.apache.org/jira/browse/SPARK-33593)[SQL] Vector reader got incorrect data with binary partition value\n+ [[SPARK-33677]](https://issues.apache.org/jira/browse/SPARK-33677)[SQL] Skip LikeSimplification rule if pattern contains any escapeChar\n+ [[SPARK-33071]](https://issues.apache.org/jira/browse/SPARK-33071)[SPARK-33536][SQL] Avoid changing dataset\\_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin\n* December 8, 2020 \n+ [[SPARK-27421]](https://issues.apache.org/jira/browse/SPARK-27421)[SQL] Fix filter for int column and value class java.lang.String when pruning partition column\n+ Spark Jobs launched using Databricks Connect could hang indefinitely with `Executor$TaskRunner.$anonfun$copySessionState` in executor stack trace\n+ Operating system security updates. \n* November 20, 2020 \n+ [[SPARK-33404]](https://issues.apache.org/jira/browse/SPARK-33404)[SQL][3.0] Fix incorrect results in `date_trunc` expression\n+ [[SPARK-33339]](https://issues.apache.org/jira/browse/SPARK-33339)[PYTHON] Pyspark application will hang due to non Exception error\n+ [[SPARK-33183]](https://issues.apache.org/jira/browse/SPARK-33183)[SQL][HOTFIX] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts\n+ [[SPARK-33371]](https://issues.apache.org/jira/browse/SPARK-33371)[PYTHON][3.0] Update setup.py and tests for Python 3.9\n+ [[SPARK-33391]](https://issues.apache.org/jira/browse/SPARK-33391)[SQL] element\\_at with CreateArray not respect one based index.\n+ [[SPARK-33306]](https://issues.apache.org/jira/browse/SPARK-33306)[SQL]Timezone is needed when cast date to string \n* November 3, 2020 \n+ Upgraded Java version from 1.8.0\\_252 to 1.8.0\\_265.\n+ Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()\n+ Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.\n* October 13, 2020 \n+ Operating system security updates.\n+ [[SPARK-32999]](https://issues.apache.org/jira/browse/SPARK-32999)[SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode\n+ Fixed listing directories in FUSE mount that contain file names with invalid XML characters\n+ FUSE mount no longer uses ListMultipartUploads\n* September 29, 2020 \n+ [[SPARK-28863]](https://issues.apache.org/jira/browse/SPARK-28863)[SQL][WARMFIX] Introduce AlreadyOptimized to prevent reanalysis of V1FallbackWriters\n+ [[SPARK-32635]](https://issues.apache.org/jira/browse/SPARK-32635)[SQL] Fix foldable propagation\n+ Add a new config `spark.shuffle.io.decoder.consolidateThreshold`. Set the config value to `Long.MAX_VALUE` to skip the consolidation of netty FrameBuffers, which prevents `java.lang.IndexOutOfBoundsException` in corner cases.\n* September 24, 2020 \n+ [[SPARK-32764]](https://issues.apache.org/jira/browse/SPARK-32764)[SQL] -0.0 should be equal to 0.0\n+ [[SPARK-32753]](https://issues.apache.org/jira/browse/SPARK-32753)[SQL] Only copy tags to node with no tags when transforming plans\n+ [[SPARK-32659]](https://issues.apache.org/jira/browse/SPARK-32659)[SQL] Fix the data issue of inserted Dynamic Partition Pruning on non-atomic type\n+ Operating system security updates.\n* September 8, 2020 \n+ A new parameter was created for Azure Synapse Analytics, `maxbinlength`. This parameter is used to control the column length of BinaryType columns, and is translated as `VARBINARY(maxbinlength)`. It can be set using `.option(\"maxbinlength\", n)`, where 0 < n <= 8000.\n* August 25, 2020 \n+ [[SPARK-32159]](https://issues.apache.org/jira/browse/SPARK-32159)[SQL] Fix integration between `Aggregator[Array[_], _, _]` and `UnresolvedMapObjects`\n+ [[SPARK-32559]](https://issues.apache.org/jira/browse/SPARK-32559)[SQL] Fix the trim logic in `UTF8String.toInt/toLong`, which didn\u2019t handle non-ASCII characters correctly\n+ [[SPARK-32543]](https://issues.apache.org/jira/browse/SPARK-32543)[R] Remove `arrow::as_tibble` usage in SparkR\n+ [[SPARK-32091]](https://issues.apache.org/jira/browse/SPARK-32091)[CORE] Ignore timeout error when removing blocks on the lost executor\n+ Fixed an issue affecting Azure Synapse connector with MSI credentials\n+ Fixed ambiguous attribute resolution in self-merge\n* August 18, 2020 \n+ [[SPARK-32594]](https://issues.apache.org/jira/browse/SPARK-32594)[SQL] Fix serialization of dates inserted to Hive tables\n+ [[SPARK-32237]](https://issues.apache.org/jira/browse/SPARK-32237)[SQL] Resolve hint in CTE\n+ [[SPARK-32431]](https://issues.apache.org/jira/browse/SPARK-32431)[SQL] Check duplicate nested columns in read from in-built datasources\n+ [[SPARK-32467]](https://issues.apache.org/jira/browse/SPARK-32467)[UI] Avoid encoding URL twice on https redirect\n+ Fixed a race condition in the SQS connector when using Trigger.Once.\n* August 11, 2020 \n+ [[SPARK-32280]](https://issues.apache.org/jira/browse/SPARK-32280)[[SPARK-32372]](https://issues.apache.org/jira/browse/SPARK-32372)[SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan\n+ [[SPARK-32234]](https://issues.apache.org/jira/browse/SPARK-32234)[SQL] Spark SQL commands are failing on selecting the ORC tables\n* August 3, 2020 \n+ You can now use the LDA transform function on a passthrough-enabled cluster. \n### [Databricks Runtime 7.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id33) \nSee [Databricks Runtime 7.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.0.html). \n* February 4, 2021 \n+ Fixed a regression that caused DBFS FUSE to fail to start when cluster environment variable configurations contain invalid bash syntax.\n* January 20, 2021 \n+ Fixed a regression in the January 12, 2021 maintenance release that can cause an incorrect AnalysisException and say the column is ambiguous in a self join. This regression happens when a user joins a DataFrame with its derived DataFrame (a so-called self-join) with the following conditions: \n- These two DataFrames have common columns, but the output of the self join does not have common columns. For example, `df.join(df.select($\"col\" as \"new_col\"), cond)`\n- The derived DataFrame excludes some columns via select, groupBy, or window.\n- The join condition or the following transformation after the joined Dataframe refers to the non-common columns. For example, `df.join(df.drop(\"a\"), df(\"a\") === 1)`\n* January 12, 2021 \n+ Operating system security updates.\n+ [[SPARK-33593]](https://issues.apache.org/jira/browse/SPARK-33593)[SQL] Vector reader got incorrect data with binary partition value\n+ [[SPARK-33677]](https://issues.apache.org/jira/browse/SPARK-33677)[SQL] Skip LikeSimplification rule if pattern contains any escapeChar\n+ [[SPARK-33071]](https://issues.apache.org/jira/browse/SPARK-33071)[SPARK-33536][SQL] Avoid changing dataset\\_id of LogicalPlan in join() to not break DetectAmbiguousSelfJoin\n* December 8, 2020 \n+ [[SPARK-27421]](https://issues.apache.org/jira/browse/SPARK-27421)[SQL] Fix filter for int column and value class java.lang.String when pruning partition column\n+ [[SPARK-33404]](https://issues.apache.org/jira/browse/SPARK-33404)[SQL] Fix incorrect results in `date_trunc` expression\n+ [[SPARK-33339]](https://issues.apache.org/jira/browse/SPARK-33339)[PYTHON] Pyspark application will hang due to non Exception error\n+ [[SPARK-33183]](https://issues.apache.org/jira/browse/SPARK-33183)[SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts\n+ [[SPARK-33391]](https://issues.apache.org/jira/browse/SPARK-33391)[SQL] element\\_at with CreateArray not respect one based index.\n+ Operating system security updates. \n* November 20, 2020 \n+ [[SPARK-33306]](https://issues.apache.org/jira/browse/SPARK-33306)[SQL]Timezone is needed when cast date to string \n* November 3, 2020 \n+ Upgraded Java version from 1.8.0\\_252 to 1.8.0\\_265.\n+ Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()\n+ Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.\n* October 13, 2020 \n+ Operating system security updates.\n+ [[SPARK-32999]](https://issues.apache.org/jira/browse/SPARK-32999)[SQL] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode\n+ Fixed listing directories in FUSE mount that contain file names with invalid XML characters\n+ FUSE mount no longer uses ListMultipartUploads\n* September 29, 2020 \n+ [[SPARK-28863]](https://issues.apache.org/jira/browse/SPARK-28863)[SQL][WARMFIX] Introduce AlreadyOptimized to prevent reanalysis of V1FallbackWriters\n+ [[SPARK-32635]](https://issues.apache.org/jira/browse/SPARK-32635)[SQL] Fix foldable propagation\n+ Add a new config `spark.shuffle.io.decoder.consolidateThreshold`. Set the config value to `Long.MAX_VALUE` to skip the consolidation of netty FrameBuffers, which prevents `java.lang.IndexOutOfBoundsException` in corner cases.\n* September 24, 2020 \n+ [[SPARK-32764]](https://issues.apache.org/jira/browse/SPARK-32764)[SQL] -0.0 should be equal to 0.0\n+ [[SPARK-32753]](https://issues.apache.org/jira/browse/SPARK-32753)[SQL] Only copy tags to node with no tags when transforming plans\n+ [[SPARK-32659]](https://issues.apache.org/jira/browse/SPARK-32659)[SQL] Fix the data issue of inserted Dynamic Partition Pruning on non-atomic type\n+ Operating system security updates.\n* September 8, 2020 \n+ A new parameter was created for Azure Synapse Analytics, `maxbinlength`. This parameter is used to control the column length of BinaryType columns, and is translated as `VARBINARY(maxbinlength)`. It can be set using `.option(\"maxbinlength\", n)`, where 0 < n <= 8000.\n* August 25, 2020 \n+ [[SPARK-32159]](https://issues.apache.org/jira/browse/SPARK-32159)[SQL] Fix integration between `Aggregator[Array[_], _, _]` and `UnresolvedMapObjects`\n+ [[SPARK-32559]](https://issues.apache.org/jira/browse/SPARK-32559)[SQL] Fix the trim logic in `UTF8String.toInt/toLong`, which didn\u2019t handle non-ASCII characters correctly\n+ [[SPARK-32543]](https://issues.apache.org/jira/browse/SPARK-32543)[R] Remove `arrow::as_tibble` usage in SparkR\n+ [[SPARK-32091]](https://issues.apache.org/jira/browse/SPARK-32091)[CORE] Ignore timeout error when removing blocks on the lost executor\n+ Fixed an issue affecting Azure Synapse connector with MSI credentials\n+ Fixed ambiguous attribute resolution in self-merge\n* August 18, 2020 \n+ [[SPARK-32594]](https://issues.apache.org/jira/browse/SPARK-32594)[SQL] Fix serialization of dates inserted to Hive tables\n+ [[SPARK-32237]](https://issues.apache.org/jira/browse/SPARK-32237)[SQL] Resolve hint in CTE\n+ [[SPARK-32431]](https://issues.apache.org/jira/browse/SPARK-32431)[SQL] Check duplicate nested columns in read from in-built datasources\n+ [[SPARK-32467]](https://issues.apache.org/jira/browse/SPARK-32467)[UI] Avoid encoding URL twice on https redirect\n+ Fixed a race condition in the SQS connector when using Trigger.Once.\n* August 11, 2020 \n+ [[SPARK-32280]](https://issues.apache.org/jira/browse/SPARK-32280)[[SPARK-32372]](https://issues.apache.org/jira/browse/SPARK-32372)[SQL] ResolveReferences.dedupRight should only rewrite attributes for ancestor nodes of the conflict plan\n+ [[SPARK-32234]](https://issues.apache.org/jira/browse/SPARK-32234)[SQL] Spark SQL commands are failing on selecting the ORC tables\n+ You can now use the LDA transform function on a passthrough-enabled cluster. \n### [Databricks Runtime 6.6 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id34) \nSee [Databricks Runtime 6.6 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/6.6.html). \n* November 20, 2020 \n+ [[SPARK-33260]](https://issues.apache.org/jira/browse/SPARK-33260)[SQL] Fix incorrect results from SortExec when sortOrder is Stream\n+ [[SPARK-32635]](https://issues.apache.org/jira/browse/SPARK-32635)[SQL] Fix foldable propagation \n* November 3, 2020 \n+ Upgraded Java version from 1.8.0\\_252 to 1.8.0\\_265.\n+ Fix ABFS and WASB locking with regard to UserGroupInformation.getCurrentUser()\n+ Fix an infinite loop bug of Avro reader when reading the MAGIC bytes.\n* October 13, 2020 \n+ Operating system security updates.\n+ [[SPARK-32999]](https://issues.apache.org/jira/browse/SPARK-32999)[SQL][2.4] Use Utils.getSimpleName to avoid hitting Malformed class name in TreeNode\n+ Fixed listing directories in FUSE mount that contain file names with invalid XML characters\n+ FUSE mount no longer uses ListMultipartUploads\n* September 24, 2020 \n+ Operating system security updates.\n* September 8, 2020 \n+ A new parameter was created for Azure Synapse Analytics, `maxbinlength`. This parameter is used to control the column length of BinaryType columns, and is translated as `VARBINARY(maxbinlength)`. It can be set using `.option(\"maxbinlength\", n)`, where 0 < n <= 8000.\n+ Update Azure Storage SDK to 8.6.4 and enable TCP keep alive on connections made by the WASB driver\n* August 25, 2020 \n+ Fixed ambiguous attribute resolution in self-merge\n* August 18, 2020 \n+ [[SPARK-32431]](https://issues.apache.org/jira/browse/SPARK-32431)[SQL] Check duplicate nested columns in read from in-built datasources\n+ Fixed a race condition in the SQS connector when using Trigger.Once.\n* August 11, 2020 \n+ [[SPARK-28676]](https://issues.apache.org/jira/browse/SPARK-28676)[CORE] Avoid Excessive logging from ContextCleaner\n+ [[SPARK-31967]](https://issues.apache.org/jira/browse/SPARK-31967)[UI] Downgrade to vis.js 4.21.0 to fix Jobs UI loading time regression\n* August 3, 2020 \n+ You can now use the LDA transform function on a passthrough-enabled cluster.\n+ Operating system security updates. \n### [Databricks Runtime 6.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id35) \nSee [Databricks Runtime 6.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/6.5.html). \n* September 24, 2020 \n+ Fixed a previous limitation where passthrough on standard cluster would still restrict the filesystem implementation user uses. Now users would be able to access local filesystems without restrictions.\n+ Operating system security updates.\n* September 8, 2020 \n+ A new parameter was created for Azure Synapse Analytics, `maxbinlength`. This parameter is used to control the column length of BinaryType columns, and is translated as `VARBINARY(maxbinlength)`. It can be set using `.option(\"maxbinlength\", n)`, where 0 < n <= 8000.\n+ Update Azure Storage SDK to 8.6.4 and enable TCP keep alive on connections made by the WASB driver\n* August 25, 2020 \n+ Fixed ambiguous attribute resolution in self-merge\n* August 18, 2020 \n+ [[SPARK-32431]](https://issues.apache.org/jira/browse/SPARK-32431)[SQL] Check duplicate nested columns in read from in-built datasources\n+ Fixed a race condition in the SQS connector when using Trigger.Once.\n* August 11, 2020 \n+ [[SPARK-28676]](https://issues.apache.org/jira/browse/SPARK-28676)[CORE] Avoid Excessive logging from ContextCleaner\n* August 3, 2020 \n+ You can now use the LDA transform function on a passthrough-enabled cluster.\n+ Operating system security updates.\n* July 7, 2020 \n+ Upgraded Java version from 1.8.0\\_242 to 1.8.0\\_252.\n* April 21, 2020 \n+ [[SPARK-31312]](https://issues.apache.org/jira/browse/SPARK-31312)[SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper \n### [Databricks Runtime 6.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id36) \nSee [Databricks Runtime 6.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/6.3.html). \n* July 7, 2020 \n+ Upgraded Java version from 1.8.0\\_232 to 1.8.0\\_252.\n* April 21, 2020 \n+ [[SPARK-31312]](https://issues.apache.org/jira/browse/SPARK-31312)[SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper\n* April 7, 2020 \n+ To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (`ARROW_PRE_0_15_IPC_FORMAT=1`) to enable support for those versions of PyArrow. See the instructions in [[SPARK-29367]](https://issues.apache.org/jira/browse/SPARK-29367).\n* March 10, 2020 \n+ The Snowflake connector (`spark-snowflake_2.11`) included in Databricks Runtime is updated to version 2.5.9. `snowflake-jdbc` is updated to version 3.12.0.\n* February 18, 2020 \n+ Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.\n* February 11, 2020 \n+ Fixed a bug in our S3 client (S3AFileSystem.java), whereby `fs.isDirectory(path)` or `fs.getFileStatus(path).isDirectory()` could sometimes incorrectly return `false`. The bug would manifest on paths for which `aws s3 list-objects-v2 --prefix path/ --max-keys 1 --delimiter /` responds with no keys or common prefixes, but `isTruncated = true`. This might happen for directories under which many objects were deleted and versioning was enabled.\n+ [[SPARK-24783]](https://issues.apache.org/jira/browse/SPARK-24783)[SQL] spark.sql.shuffle.partitions=0 should throw exception\n+ [[SPARK-30447]](https://issues.apache.org/jira/browse/SPARK-30447)[SQL] Constant propagation nullability issue\n+ [[SPARK-28152]](https://issues.apache.org/jira/browse/SPARK-28152)[SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping\n+ Allowlisted the overwrite function so that the MLModels extends MLWriter could call the function. \n### [Databricks Runtime 6.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id37) \nSee [Databricks Runtime 6.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/6.2.html). \n* April 21, 2020 \n+ [[SPARK-31312]](https://issues.apache.org/jira/browse/SPARK-31312)[SQL] Cache Class instance for the UDF instance in HiveFunctionWrapper\n* April 7, 2020 \n+ To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (`ARROW_PRE_0_15_IPC_FORMAT=1`) to enable support for those versions of PyArrow. See the instructions in [[SPARK-29367]](https://issues.apache.org/jira/browse/SPARK-29367).\n* March 25, 2020 \n+ Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the `spark.databricks.driver.disableScalaOutput` Spark configuration to `true`. By default the flag value is `false`. The flag controls cell output for Scala JAR jobs and Scala notebooks. If the flag is enabled, Spark does not return job execution results to the client. The flag does not affect the data that is written in the cluster\u2019s log files. Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.\n* March 10, 2020 \n+ The Snowflake connector (`spark-snowflake_2.11`) included in Databricks Runtime is updated to version 2.5.9. `snowflake-jdbc` is updated to version 3.12.0.\n* February 18, 2020 \n+ [[SPARK-24783]](https://issues.apache.org/jira/browse/SPARK-24783)[SQL] spark.sql.shuffle.partitions=0 should throw exception\n+ Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.\n* January 28, 2020 \n+ Fixed a bug in S3AFileSystem, whereby `fs.isDirectory(path)` or `fs.getFileStatus(path).isDirectory()` could sometimes incorrectly return `false`. The bug would manifest on paths for which `aws s3 list-objects-v2 --prefix path/ --max-keys 1 --delimiter /` responds with no keys or common prefixes, but `isTruncated = true`. This might happen for directories under which many objects were deleted and versioning was enabled.\n+ Allowlisted ML Model Writers\u2019 overwrite function for clusters enabled for credential passthrough, so that model save can use overwrite mode on credential passthrough clusters.\n+ [[SPARK-30447]](https://issues.apache.org/jira/browse/SPARK-30447)[SQL] Constant propagation nullability issue.\n+ [[SPARK-28152]](https://issues.apache.org/jira/browse/SPARK-28152)[SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping.\n* January 14, 2020 \n+ Upgraded Java version from 1.8.0\\_222 to 1.8.0\\_232.\n* December 10, 2019 \n+ [[SPARK-29904]](https://issues.apache.org/jira/browse/SPARK-29904)[SQL] Parse timestamps in microsecond precision by JSON/CSV data sources. \n### [Databricks Runtime 6.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id38) \nSee [Databricks Runtime 6.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/6.1.html). \n* April 7, 2020 \n+ To resolve an issue with pandas udf not working with PyArrow 0.15.0 and above, we added an environment variable (`ARROW_PRE_0_15_IPC_FORMAT=1`) to enable support for those versions of PyArrow. See the instructions in [[SPARK-29367]](https://issues.apache.org/jira/browse/SPARK-29367).\n* March 25, 2020 \n+ Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the `spark.databricks.driver.disableScalaOutput` Spark configuration to `true`. By default the flag value is `false`. The flag controls cell output for Scala JAR jobs and Scala notebooks. If the flag is enabled, Spark does not return job execution results to the client. The flag does not affect the data that is written in the cluster\u2019s log files. Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.\n* March 10, 2020 \n+ The Snowflake connector (`spark-snowflake_2.11`) included in Databricks Runtime is updated to version 2.5.9. `snowflake-jdbc` is updated to version 3.12.0.\n* February 18, 2020 \n+ [[SPARK-24783]](https://issues.apache.org/jira/browse/SPARK-24783)[SQL] spark.sql.shuffle.partitions=0 should throw exception\n+ Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.\n* January 28, 2020 \n+ Fixed a bug in S3AFileSystem, whereby `fs.isDirectory(path)` or `fs.getFileStatus(path).isDirectory()` could sometimes incorrectly return `false`. The bug would manifest on paths for which `aws s3 list-objects-v2 --prefix path/ --max-keys 1 --delimiter /` responds with no keys or common prefixes, but `isTruncated = true`. This might happen for directories under which many objects were deleted and versioning was enabled.\n+ [[SPARK-30447]](https://issues.apache.org/jira/browse/SPARK-30447)[SQL] Constant propagation nullability issue.\n+ [[SPARK-28152]](https://issues.apache.org/jira/browse/SPARK-28152)[SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping.\n* January 14, 2020 \n+ Upgraded Java version from 1.8.0\\_222 to 1.8.0\\_232.\n* November 7, 2019 \n+ [[SPARK-29743]](https://issues.apache.org/jira/browse/SPARK-29743)[SQL] sample should set needCopyResult to true if its child\u2019s needCopyResult is true.\n+ Secrets referenced from Spark configuration properties and environment variables in Public Preview. See [Use a secret in a Spark configuration property or environment variable](https://docs.databricks.com/security/secrets/secrets.html#spark-conf-env-var).\n* November 5, 2019 \n+ Fixed a bug in DBFS FUSE to handle mount points having `//` in its path.\n+ [[SPARK-29081]](https://issues.apache.org/jira/browse/SPARK-29081) Replace calls to SerializationUtils.clone on properties with a faster implementation\n+ [[SPARK-29244]](https://issues.apache.org/jira/browse/SPARK-29244)[CORE] Prevent freed page in BytesToBytesMap free again\n+ **(6.1 ML)** Library mkl version 2019.4 was installed unintentionally. We downgraded it to mkl version 2019.3 to match Anaconda Distribution 2019.03. \n### [Databricks Runtime 6.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id39) \nSee [Databricks Runtime 6.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/6.0.html). \n* March 25, 2020 \n+ Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed. To avoid encountering this limit, you can prevent stdout from being returned from the driver to by setting the `spark.databricks.driver.disableScalaOutput` Spark configuration to `true`. By default the flag value is `false`. The flag controls cell output for Scala JAR jobs and Scala notebooks. If the flag is enabled, Spark does not return job execution results to the client. The flag does not affect the data that is written in the cluster\u2019s log files. Setting this flag is recommended only for automated clusters for JAR jobs, because it will disable notebook results.\n* February 18, 2020 \n+ Credential passthrough with ADLS Gen2 has a performance degradation due to incorrect thread local handling when ADLS client prefetching is enabled. This release disables ADLS Gen2 prefetching when credential passthrough is enabled until we have a proper fix.\n* February 11, 2020 \n+ [[SPARK-24783]](https://issues.apache.org/jira/browse/SPARK-24783)[SQL] spark.sql.shuffle.partitions=0 should throw exception\n* January 28, 2020 \n+ Fixed a bug in S3AFileSystem, whereby `fs.isDirectory(path)` or `fs.getFileStatus(path).isDirectory()` could sometimes incorrectly return `false`. The bug would manifest on paths for which `aws s3 list-objects-v2 --prefix path/ --max-keys 1 --delimiter /` responds with no keys or common prefixes, but `isTruncated = true`. This might happen for directories under which many objects were deleted and versioning was enabled.\n+ [[SPARK-30447]](https://issues.apache.org/jira/browse/SPARK-30447)[SQL] Constant propagation nullability issue.\n+ [[SPARK-28152]](https://issues.apache.org/jira/browse/SPARK-28152)[SQL] Add a legacy conf for old MsSqlServerDialect numeric mapping.\n* January 14, 2020 \n+ Upgraded Java version from 1.8.0\\_222 to 1.8.0\\_232.\n* November 19, 2019 \n+ [[SPARK-29743]](https://issues.apache.org/jira/browse/SPARK-29743) [SQL] sample should set needCopyResult to true if its child\u2019s needCopyResult is true\n* November 5, 2019 \n+ DBFS FUSE supports S3 mounts with canned ACL.\n+ `dbutils.tensorboard.start()` now supports TensorBoard 2.0 (if installed manually).\n+ Fixed a bug in DBFS FUSE to handle mount points having `//` in its path.\n+ [[SPARK-29081]](https://issues.apache.org/jira/browse/SPARK-29081)Replace calls to SerializationUtils.clone on properties with a faster implementation\n* October 23, 2019 \n+ [[SPARK-29244]](https://issues.apache.org/jira/browse/SPARK-29244)[CORE] Prevent freed page in BytesToBytesMap free again\n* October 8, 2019 \n+ Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires [Simba Apache Spark ODBC driver version 2.6.10](https://databricks.com/spark/odbc-driver-download)).\n+ Fixed an issue affecting using `Optimize` command with table ACL enabled clusters.\n+ Fixed an issue where `pyspark.ml` libraries would fail due to Scala UDF forbidden error on table ACL enabled clusters.\n+ Fixed NullPointerException when checking error code in the WASB client. \n### [Databricks Runtime 5.4 ML (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id40) \nSee [Databricks Runtime 5.4 for ML (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/5.4ml.html). \n* June 18, 2019 \n+ Improved handling of MLflow active runs in Hyperopt integration\n+ Improved messages in Hyperopt\n+ Updated package `Marchkdown` from 3.1 to 3.1.1 \n### [Databricks Runtime 5.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id41) \nSee [Databricks Runtime 5.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/5.4.html). \n* November 19, 2019 \n+ [[SPARK-29743]](https://issues.apache.org/jira/browse/SPARK-29743) [SQL] sample should set needCopyResult to true if its child\u2019s needCopyResult is true\n* October 8, 2019 \n+ Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver update to version 2.6.10).\n+ Fixed NullPointerException when checking error code in the WASB client.\n* September 10, 2019 \n+ Add thread safe iterator to BytesToBytesMap\n+ Fixed a bug affecting certain global aggregation queries.\n+ [[SPARK-27330]](https://issues.apache.org/jira/browse/SPARK-27330)[SS] support task abort in foreach writer\n+ [[SPARK-28642]](https://issues.apache.org/jira/browse/SPARK-28642)Hide credentials in SHOW CREATE TABLE\n+ [[SPARK-28699]](https://issues.apache.org/jira/browse/SPARK-28699)[SQL] Disable using radix sort for ShuffleExchangeExec in repartition case\n+ [[SPARK-28699]](https://issues.apache.org/jira/browse/SPARK-28699)[CORE] Fix a corner case for aborting indeterminate stage\n* August 27, 2019 \n+ Fixed an issue affecting certain `transform` expressions\n* August 13, 2019 \n+ Delta streaming source should check the latest protocol of a table\n+ [[SPARK-28489]](https://issues.apache.org/jira/browse/SPARK-28489)[SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets\n* July 30, 2019 \n+ [[SPARK-28015]](https://issues.apache.org/jira/browse/SPARK-28015)[SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats\n+ [[SPARK-28308]](https://issues.apache.org/jira/browse/SPARK-28308)[CORE] CalendarInterval sub-second part should be padded before parsing\n+ [[SPARK-27485]](https://issues.apache.org/jira/browse/SPARK-27485)EnsureRequirements.reorder should handle duplicate expressions gracefully\n* July 2, 2019 \n+ Upgraded snappy-java from 1.1.7.1 to 1.1.7.3.\n* June 18, 2019 \n+ Improved handling of MLflow active runs in MLlib integration\n+ Improved Databricks Advisor message related to using disk caching\n+ Fixed a bug affecting using higher order functions\n+ Fixed a bug affecting Delta metadata queries \n### [Databricks Runtime 5.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id42) \nSee [Databricks Runtime 5.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/5.3.html). \n* November 7, 2019 \n+ [[SPARK-29743]](https://issues.apache.org/jira/browse/SPARK-29743)[SQL] sample should set needCopyResult to true if its child\u2019s needCopyResult is true\n* October 8, 2019 \n+ Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver update to version 2.6.10).\n+ Fixed NullPointerException when checking error code in the WASB client.\n* September 10, 2019 \n+ Add thread safe iterator to BytesToBytesMap\n+ Fixed a bug affecting certain global aggregation queries.\n+ [[SPARK-27330]](https://issues.apache.org/jira/browse/SPARK-27330)[SS] support task abort in foreach writer\n+ [[SPARK-28642]](https://issues.apache.org/jira/browse/SPARK-28642)Hide credentials in SHOW CREATE TABLE\n+ [[SPARK-28699]](https://issues.apache.org/jira/browse/SPARK-28699)[SQL] Disable using radix sort for ShuffleExchangeExec in repartition case\n+ [[SPARK-28699]](https://issues.apache.org/jira/browse/SPARK-28699)[CORE] Fix a corner case for aborting indeterminate stage\n* August 27, 2019 \n+ Fixed an issue affecting certain `transform` expressions\n* August 13, 2019 \n+ Delta streaming source should check the latest protocol of a table\n+ [[SPARK-28489]](https://issues.apache.org/jira/browse/SPARK-28489)[SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets\n* July 30, 2019 \n+ [[SPARK-28015]](https://issues.apache.org/jira/browse/SPARK-28015)[SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats\n+ [[SPARK-28308]](https://issues.apache.org/jira/browse/SPARK-28308)[CORE] CalendarInterval sub-second part should be padded before parsing\n+ [[SPARK-27485]](https://issues.apache.org/jira/browse/SPARK-27485)EnsureRequirements.reorder should handle duplicate expressions gracefully\n* June 18, 2019 \n+ Improved Databricks Advisor message related to using disk caching\n+ Fixed a bug affecting using higher order functions\n+ Fixed a bug affecting Delta metadata queries\n* May 28, 2019 \n+ Improved the stability of Delta\n+ Tolerate IOExceptions when reading Delta LAST\\_CHECKPOINT file \n- Added recovery to failed library installation\n* May 7, 2019 \n+ Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector\n+ Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector\n+ Fixed a bug affecting table ACLs\n+ Renamed `fs.s3a.requesterPays.enabled` to `fs.s3a.requester-pays.enabled`\n+ Fixed a race condition when loading a Delta log checksum file\n+ Fixed Delta conflict detection logic to not identify \u201cinsert + overwrite\u201d as pure \u201cappend\u201d operation\n+ Fixed a bug affecting Amazon Kinesis connector\n+ Ensure that disk caching is not disabled when table ACLs are enabled\n+ [SPARK-27494][SS] Null keys/values don\u2019t work in Kafka source v2\n+ [SPARK-27446][R] Use existing spark conf if available.\n+ [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images\n+ [SPARK-27160][SQL] Fix DecimalType when building orc filters\n+ [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager \n### [Databricks Runtime 5.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id43) \nSee [Databricks Runtime 5.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/5.2.html). \n* September 10, 2019 \n+ Add thread safe iterator to BytesToBytesMap\n+ Fixed a bug affecting certain global aggregation queries.\n+ [[SPARK-27330]](https://issues.apache.org/jira/browse/SPARK-27330)[SS] support task abort in foreach writer\n+ [[SPARK-28642]](https://issues.apache.org/jira/browse/SPARK-28642)Hide credentials in SHOW CREATE TABLE\n+ [[SPARK-28699]](https://issues.apache.org/jira/browse/SPARK-28699)[SQL] Disable using radix sort for ShuffleExchangeExec in repartition case\n+ [[SPARK-28699]](https://issues.apache.org/jira/browse/SPARK-28699)[CORE] Fix a corner case for aborting indeterminate stage\n* August 27, 2019 \n+ Fixed an issue affecting certain `transform` expressions\n* August 13, 2019 \n+ Delta streaming source should check the latest protocol of a table\n+ [[SPARK-28489]](https://issues.apache.org/jira/browse/SPARK-28489)[SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets\n* July 30, 2019 \n+ [[SPARK-28015]](https://issues.apache.org/jira/browse/SPARK-28015)[SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats\n+ [[SPARK-28308]](https://issues.apache.org/jira/browse/SPARK-28308)[CORE] CalendarInterval sub-second part should be padded before parsing\n+ [[SPARK-27485]](https://issues.apache.org/jira/browse/SPARK-27485)EnsureRequirements.reorder should handle duplicate expressions gracefully\n* July 2, 2019 \n+ Tolerate IOExceptions when reading Delta LAST\\_CHECKPOINT file\n* June 18, 2019 \n+ Improved Databricks Advisor message related to using disk cache\n+ Fixed a bug affecting using higher order functions\n+ Fixed a bug affecting Delta metadata queries\n* May 28, 2019 \n+ Added recovery to failed library installation\n* May 7, 2019 \n+ Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector\n+ Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector\n+ Fixed a race condition when loading a Delta log checksum file\n+ Fixed Delta conflict detection logic to not identify \u201cinsert + overwrite\u201d as pure \u201cappend\u201d operation\n+ Fixed a bug affecting Amazon Kinesis connector\n+ Ensure that disk caching is not disabled when table ACLs are enabled\n+ [SPARK-27494][SS] Null keys/values don\u2019t work in Kafka source v2\n+ [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images\n+ [SPARK-27160][SQL] Fix DecimalType when building orc filters\n+ [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager\n* March 26, 2019 \n+ Avoid embedding platform-dependent offsets literally in whole-stage generated code\n+ [[SPARK-26665]](https://issues.apache.org/jira/browse/SPARK-26665)[CORE] Fix a bug that BlockTransferService.fetchBlockSync may hang forever.\n+ [[SPARK-27134]](https://issues.apache.org/jira/browse/SPARK-27134)[SQL] array\\_distinct function does not work correctly with columns containing array of array.\n+ [[SPARK-24669]](https://issues.apache.org/jira/browse/SPARK-24669)[SQL] Invalidate tables in case of DROP DATABASE CASCADE.\n+ [[SPARK-26572]](https://issues.apache.org/jira/browse/SPARK-26572)[SQL] fix aggregate codegen result evaluation.\n+ Fixed a bug affecting certain PythonUDFs.\n* February 26, 2019 \n+ [[SPARK-26864]](https://issues.apache.org/jira/browse/SPARK-26864)[SQL] Query may return incorrect result when python udf is used as a left-semi join condition.\n+ [[SPARK-26887]](https://issues.apache.org/jira/browse/SPARK-26887)[PYTHON] Create datetime.date directly instead of creating datetime64 as intermediate data.\n+ Fixed a bug affecting JDBC/ODBC server.\n+ Fixed a bug affecting PySpark.\n+ Exclude the hidden files when building HadoopRDD.\n+ Fixed a bug in Delta that caused serialization issues.\n* February 12, 2019 \n+ Fixed an issue affecting using Delta with Azure ADLS Gen2 mount points.\n+ Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (in [HIPAA compliance features](https://docs.databricks.com/security/privacy/hipaa.html)) or when `spark.network.crypto.enabled` is set to true).\n* January 30, 2019 \n+ Fixed the StackOverflowError when putting skew join hint on cached relation.\n+ Fixed the inconsistency between a SQL cache\u2019s cached RDD and its physical plan, which causes incorrect result.\n+ [[SPARK-26706]](https://issues.apache.org/jira/browse/SPARK-26706)[SQL] Fix `illegalNumericPrecedence` for ByteType.\n+ [[SPARK-26709]](https://issues.apache.org/jira/browse/SPARK-26709)[SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.\n+ CSV/JSON data sources should avoid globbing paths when inferring schema.\n+ Fixed constraint inference on Window operator.\n+ Fixed an issue affecting installing egg libraries with clusters having table ACL enabled. \n### [Databricks Runtime 5.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id44) \nSee [Databricks Runtime 5.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/5.1.html). \n* August 13, 2019 \n+ Delta streaming source should check the latest protocol of a table\n+ [[SPARK-28489]](https://issues.apache.org/jira/browse/SPARK-28489)[SS]Fix a bug that KafkaOffsetRangeCalculator.getRanges may drop offsets\n* July 30, 2019 \n+ [[SPARK-28015]](https://issues.apache.org/jira/browse/SPARK-28015)[SQL] Check stringToDate() consumes entire input for the yyyy and yyyy-[m]m formats\n+ [[SPARK-28308]](https://issues.apache.org/jira/browse/SPARK-28308)[CORE] CalendarInterval sub-second part should be padded before parsing\n+ [[SPARK-27485]](https://issues.apache.org/jira/browse/SPARK-27485)EnsureRequirements.reorder should handle duplicate expressions gracefully\n* July 2, 2019 \n+ Tolerate IOExceptions when reading Delta LAST\\_CHECKPOINT file\n* June 18, 2019 \n+ Fixed a bug affecting using higher order functions\n+ Fixed a bug affecting Delta metadata queries\n* May 28, 2019 \n+ Added recovery to failed library installation\n* May 7, 2019 \n+ Port HADOOP-15778 (ABFS: Fix client side throttling for read) to Azure Data Lake Storage Gen2 connector\n+ Port HADOOP-16040 (ABFS: Bug fix for tolerateOobAppends configuration) to Azure Data Lake Storage Gen2 connector\n+ Fixed a race condition when loading a Delta log checksum file\n+ Fixed Delta conflict detection logic to not identify \u201cinsert + overwrite\u201d as pure \u201cappend\u201d operation\n+ [SPARK-27494][SS] Null keys/values don\u2019t work in Kafka source v2\n+ [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images\n+ [SPARK-27160][SQL] Fix DecimalType when building orc filters\n+ [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager\n* March 26, 2019 \n+ Avoid embedding platform-dependent offsets literally in whole-stage generated code\n+ Fixed a bug affecting certain PythonUDFs.\n* February 26, 2019 \n+ [[SPARK-26864]](https://issues.apache.org/jira/browse/SPARK-26864)[SQL] Query may return incorrect result when python udf is used as a left-semi join condition.\n+ Fixed a bug affecting JDBC/ODBC server.\n+ Exclude the hidden files when building HadoopRDD.\n* February 12, 2019 \n+ Fixed an issue affecting installing egg libraries with clusters having table ACL enabled.\n+ Fixed the inconsistency between a SQL cache\u2019s cached RDD and its physical plan, which causes incorrect result.\n+ [[SPARK-26706]](https://issues.apache.org/jira/browse/SPARK-26706)[SQL] Fix `illegalNumericPrecedence` for ByteType.\n+ [[SPARK-26709]](https://issues.apache.org/jira/browse/SPARK-26709)[SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.\n+ Fixed constraint inference on Window operator.\n+ Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (in [HIPAA compliance features](https://docs.databricks.com/security/privacy/hipaa.html)) or when `spark.network.crypto.enabled` is set to true).\n* January 30, 2019 \n+ Fixed an issue that can cause `df.rdd.count()` with UDT to return incorrect answer for certain cases.\n+ Fixed an issue affecting installing wheelhouses.\n+ [[SPARK-26267]](https://issues.apache.org/jira/browse/SPARK-26267)Retry when detecting incorrect offsets from Kafka.\n+ Fixed a bug that affects multiple file stream sources in a streaming query.\n+ Fixed the StackOverflowError when putting skew join hint on cached relation.\n+ Fixed the inconsistency between a SQL cache\u2019s cached RDD and its physical plan, which causes incorrect result.\n* January 8, 2019 \n+ Fixed issue that causes the error `org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted`.\n+ [[SPARK-26352]](https://issues.apache.org/jira/browse/SPARK-26352)join reordering should not change the order of output attributes.\n+ [[SPARK-26366]](https://issues.apache.org/jira/browse/SPARK-26366)ReplaceExceptWithFilter should consider NULL as False.\n+ Stability improvement for Delta Lake.\n+ Delta Lake is enabled.\n+ Databricks IO Cache is enabled for the IO Cache Accelerated instance type. \n### [Databricks Runtime 5.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id45) \nSee [Databricks Runtime 5.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/5.0.html). \n* June 18, 2019 \n+ Fixed a bug affecting using higher order functions\n* May 7, 2019 \n+ Fixed a race condition when loading a Delta log checksum file\n+ Fixed Delta conflict detection logic to not identify \u201cinsert + overwrite\u201d as pure \u201cappend\u201d operation\n+ [SPARK-27494][SS] Null keys/values don\u2019t work in Kafka source v2\n+ [SPARK-27454][SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images\n+ [SPARK-27160][SQL] Fix DecimalType when building orc filters \n- [SPARK-27338][CORE] Fix deadlock between UnsafeExternalSorter and TaskMemoryManager\n* March 26, 2019 \n+ Avoid embedding platform-dependent offsets literally in whole-stage generated code\n+ Fixed a bug affecting certain PythonUDFs.\n* March 12, 2019 \n+ [[SPARK-26864]](https://issues.apache.org/jira/browse/SPARK-26864)[SQL] Query may return incorrect result when python udf is used as a left-semi join condition.\n* February 26, 2019 \n+ Fixed a bug affecting JDBC/ODBC server.\n+ Exclude the hidden files when building HadoopRDD.\n* February 12, 2019 \n+ Fixed the inconsistency between a SQL cache\u2019s cached RDD and its physical plan, which causes incorrect result.\n+ [[SPARK-26706]](https://issues.apache.org/jira/browse/SPARK-26706)[SQL] Fix `illegalNumericPrecedence` for ByteType.\n+ [[SPARK-26709]](https://issues.apache.org/jira/browse/SPARK-26709)[SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.\n+ Fixed constraint inference on Window operator.\n+ Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (in [HIPAA compliance features](https://docs.databricks.com/security/privacy/hipaa.html)) or when `spark.network.crypto.enabled` is set to true).\n* January 30, 2019 \n+ Fixed an issue that can cause `df.rdd.count()` with UDT to return incorrect answer for certain cases.\n+ [[SPARK-26267]](https://issues.apache.org/jira/browse/SPARK-26267)Retry when detecting incorrect offsets from Kafka.\n+ Fixed a bug that affects multiple file stream sources in a streaming query.\n+ Fixed the StackOverflowError when putting skew join hint on cached relation.\n+ Fixed the inconsistency between a SQL cache\u2019s cached RDD and its physical plan, which causes incorrect result.\n* January 8, 2019 \n+ Fixed issue that caused the error `org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted`.\n+ [[SPARK-26352]](https://issues.apache.org/jira/browse/SPARK-26352)join reordering should not change the order of output attributes.\n+ [[SPARK-26366]](https://issues.apache.org/jira/browse/SPARK-26366)ReplaceExceptWithFilter should consider NULL as False.\n+ Stability improvement for Delta Lake.\n+ Delta Lake is enabled.\n+ Databricks IO Cache is enabled for the IO Cache Accelerated instance type.\n* December 18, 2018 \n+ [[SPARK-26293]](https://issues.apache.org/jira/browse/SPARK-26293)Cast exception when having Python UDF in subquery\n+ Fixed an issue affecting certain queries using Join and Limit.\n+ Redacted credentials from RDD names in Spark UI\n* December 6, 2018 \n+ Fixed an issue that caused incorrect query result when using orderBy followed immediately by groupBy with group-by key as the leading part of the sort-by key.\n+ Upgraded Snowflake Connector for Spark from 2.4.9.2-spark\\_2.4\\_pre\\_release to 2.4.10.\n+ Only ignore corrupt files after one or more retries when `spark.sql.files.ignoreCorruptFiles` or `spark.sql.files.ignoreMissingFiles` flag is enabled.\n+ Fixed an issue affecting certain self union queries.\n+ Fixed a bug with the thrift server where sessions are sometimes leaked when cancelled.\n+ [[SPARK-26307]](https://issues.apache.org/jira/browse/SPARK-26307)Fixed CTAS when INSERT a partitioned table using Hive SerDe.\n+ [[SPARK-26147]](https://issues.apache.org/jira/browse/SPARK-26147)Python UDFs in join condition fail even when using columns from only one side of join\n+ [[SPARK-26211]](https://issues.apache.org/jira/browse/SPARK-26211)Fix InSet for binary, and struct and array with null.\n+ [[SPARK-26181]](https://issues.apache.org/jira/browse/SPARK-26181)the `hasMinMaxStats` method of `ColumnStatsMap` is not correct.\n+ Fixed an issue affecting installing Python Wheels in environments without Internet access.\n* November 20, 2018 \n+ Fixed an issue that caused a notebook not usable after cancelling a streaming query.\n+ Fixed an issue affecting certain queries using window functions.\n+ Fixed an issue affecting a stream from Delta with multiple schema changes.\n+ Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.\n+ Fixed an issue affecting reading timestamp columns from Redshift. \n### [Databricks Runtime 4.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id46) \nSee [Databricks Runtime 4.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/4.3.html). \n* April 9, 2019 \n+ [[SPARK-26665]](https://issues.apache.org/jira/browse/SPARK-26665)[CORE] Fix a bug that can cause BlockTransferService.fetchBlockSync to hang forever.\n+ [[SPARK-24669]](https://issues.apache.org/jira/browse/SPARK-24669)[SQL] Invalidate tables in case of DROP DATABASE CASCADE.\n* March 12, 2019 \n+ Fixed a bug affecting code generation.\n+ Fixed a bug affecting Delta.\n* February 26, 2019 \n+ Fixed a bug affecting JDBC/ODBC server.\n* February 12, 2019 \n+ [[SPARK-26709]](https://issues.apache.org/jira/browse/SPARK-26709)[SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.\n+ Excluding the hidden files when building HadoopRDD.\n+ Fixed Parquet Filter Conversion for IN predicate when its value is empty.\n+ Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (in [HIPAA compliance features](https://docs.databricks.com/security/privacy/hipaa.html)) or when `spark.network.crypto.enabled` is set to true).\n* January 30, 2019 \n+ Fixed an issue that can cause `df.rdd.count()` with UDT to return incorrect answer for certain cases.\n+ Fixed the inconsistency between a SQL cache\u2019s cached RDD and its physical plan, which causes incorrect result.\n* January 8, 2019 \n+ Fixed the issue that causes the error `org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted`.\n+ Redacted credentials from RDD names in Spark UI\n+ [[SPARK-26352]](https://issues.apache.org/jira/browse/SPARK-26352)join reordering should not change the order of output attributes.\n+ [[SPARK-26366]](https://issues.apache.org/jira/browse/SPARK-26366)ReplaceExceptWithFilter should consider NULL as False.\n+ Delta Lake is enabled.\n+ Databricks IO Cache is enabled for the IO Cache Accelerated instance type.\n* December 18, 2018 \n+ [[SPARK-25002]](https://issues.apache.org/jira/browse/SPARK-25002)Avro: revise the output record namespace.\n+ Fixed an issue affecting certain queries using Join and Limit.\n+ [[SPARK-26307]](https://issues.apache.org/jira/browse/SPARK-26307)Fixed CTAS when INSERT a partitioned table using Hive SerDe.\n+ Only ignore corrupt files after one or more retries when `spark.sql.files.ignoreCorruptFiles` or `spark.sql.files.ignoreMissingFiles` flag is enabled.\n+ [[SPARK-26181]](https://issues.apache.org/jira/browse/SPARK-26181)the `hasMinMaxStats` method of `ColumnStatsMap` is not correct.\n+ Fixed an issue affecting installing Python Wheels in environments without Internet access.\n+ Fixed a performance issue in query analyzer.\n+ Fixed an issue in PySpark that caused DataFrame actions failed with \u201cconnection refused\u201d error.\n+ Fixed an issue affecting certain self union queries.\n* November 20, 2018 \n+ [[SPARK-17916]](https://issues.apache.org/jira/browse/SPARK-17916)[[SPARK-25241]](https://issues.apache.org/jira/browse/SPARK-25241)Fix empty string being parsed as null when nullValue is set.\n+ [[SPARK-25387]](https://issues.apache.org/jira/browse/SPARK-25387)Fix for NPE caused by bad CSV input.\n+ Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.\n+ Fixed an issue affecting reading timestamp columns from Redshift.\n* November 6, 2018 \n+ [[SPARK-25741]](https://issues.apache.org/jira/browse/SPARK-25741)Long URLs are not rendered properly in web UI.\n+ [[SPARK-25714]](https://issues.apache.org/jira/browse/SPARK-25714)Fix Null Handling in the Optimizer rule BooleanSimplification.\n+ Fixed an issue affecting temporary objects cleanup in Synapse Analytics connector.\n+ [[SPARK-25816]](https://issues.apache.org/jira/browse/SPARK-25816)Fix attribute resolution in nested extractors. \n* October 9, 2018 \n+ Fixed a bug affecting the output of running `SHOW CREATE TABLE` on Delta tables.\n+ Fixed a bug affecting `Union` operation. \n* September 25, 2018 \n+ [[SPARK-25368]](https://issues.apache.org/jira/browse/SPARK-25368)[SQL] Incorrect constraint inference returns wrong result.\n+ [[SPARK-25402]](https://issues.apache.org/jira/browse/SPARK-25402)[SQL] Null handling in BooleanSimplification.\n+ Fixed `NotSerializableException` in Avro data source.\n* September 11, 2018 \n+ [[SPARK-25214]](https://issues.apache.org/jira/browse/SPARK-25214)[SS] Fix the issue that Kafka v2 source may return duplicated records when `failOnDataLoss=false`.\n+ [[SPARK-24987]](https://issues.apache.org/jira/browse/SPARK-24987)[SS] Fix Kafka consumer leak when no new offsets for articlePartition.\n+ Filter reduction should handle null value correctly.\n+ Improved stability of execution engine.\n* August 28, 2018 \n+ Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.\n+ [[SPARK-25142]](https://issues.apache.org/jira/browse/SPARK-25142)Add error messages when Python worker could not open socket in `_load_from_socket`.\n* August 23, 2018 \n+ [[SPARK-23935]](https://issues.apache.org/jira/browse/SPARK-23935)mapEntry throws `org.codehaus.commons.compiler.CompileException`.\n+ Fixed nullable map issue in Parquet reader.\n+ [[SPARK-25051]](https://issues.apache.org/jira/browse/SPARK-25051)[SQL] FixNullability should not stop on AnalysisBarrier.\n+ [[SPARK-25081]](https://issues.apache.org/jira/browse/SPARK-25081)Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.\n+ Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.\n+ [[SPARK-25084]](https://issues.apache.org/jira/browse/SPARK-25084)\u201ddistribute by\u201d on multiple columns (wrap in brackets) may lead to codegen issue.\n+ [[SPARK-25096]](https://issues.apache.org/jira/browse/SPARK-25096)Loosen nullability if the cast is force-nullable.\n+ Lowered the default number of threads used by the Delta Lake Optimize command, reducing memory overhead and committing data faster.\n+ [[SPARK-25114]](https://issues.apache.org/jira/browse/SPARK-25114)Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX\\_VALUE.\n+ Fixed secret manager redaction when command partially succeed. \n### [Databricks Runtime 4.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id47) \nSee [Databricks Runtime 4.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/4.2.html). \n* February 26, 2019 \n+ Fixed a bug affecting JDBC/ODBC server.\n* February 12, 2019 \n+ [[SPARK-26709]](https://issues.apache.org/jira/browse/SPARK-26709)[SQL] OptimizeMetadataOnlyQuery does not handle empty records correctly.\n+ Excluding the hidden files when building HadoopRDD.\n+ Fixed Parquet Filter Conversion for IN predicate when its value is empty.\n+ Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (in [HIPAA compliance features](https://docs.databricks.com/security/privacy/hipaa.html)) or when `spark.network.crypto.enabled` is set to true).\n* January 30, 2019 \n+ Fixed an issue that can cause `df.rdd.count()` with UDT to return incorrect answer for certain cases.\n* January 8, 2019 \n+ Fixed issue that causes the error `org.apache.spark.sql.expressions.Window.rangeBetween(long,long) is not whitelisted`.\n+ Redacted credentials from RDD names in Spark UI\n+ [[SPARK-26352]](https://issues.apache.org/jira/browse/SPARK-26352)join reordering should not change the order of output attributes.\n+ [[SPARK-26366]](https://issues.apache.org/jira/browse/SPARK-26366)ReplaceExceptWithFilter should consider NULL as False.\n+ Delta Lake is enabled.\n+ Databricks IO Cache is enabled for the IO Cache Accelerated instance type.\n* December 18, 2018 \n+ [[SPARK-25002]](https://issues.apache.org/jira/browse/SPARK-25002)Avro: revise the output record namespace.\n+ Fixed an issue affecting certain queries using Join and Limit.\n+ [[SPARK-26307]](https://issues.apache.org/jira/browse/SPARK-26307)Fixed CTAS when INSERT a partitioned table using Hive SerDe.\n+ Only ignore corrupt files after one or more retries when `spark.sql.files.ignoreCorruptFiles` or `spark.sql.files.ignoreMissingFiles` flag is enabled.\n+ [[SPARK-26181]](https://issues.apache.org/jira/browse/SPARK-26181)the `hasMinMaxStats` method of `ColumnStatsMap` is not correct.\n+ Fixed an issue affecting installing Python Wheels in environments without Internet access.\n+ Fixed a performance issue in query analyzer.\n+ Fixed an issue in PySpark that caused DataFrame actions failed with \u201cconnection refused\u201d error.\n+ Fixed an issue affecting certain self union queries.\n* November 20, 2018 \n+ [[SPARK-17916]](https://issues.apache.org/jira/browse/SPARK-17916)[[SPARK-25241]](https://issues.apache.org/jira/browse/SPARK-25241)Fix empty string being parsed as null when nullValue is set.\n+ Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.\n+ Fixed an issue affecting reading timestamp columns from Redshift.\n* November 6, 2018 \n+ [[SPARK-25741]](https://issues.apache.org/jira/browse/SPARK-25741)Long URLs are not rendered properly in web UI.\n+ [[SPARK-25714]](https://issues.apache.org/jira/browse/SPARK-25714)Fix Null Handling in the Optimizer rule BooleanSimplification. \n* October 9, 2018 \n+ Fixed a bug affecting the output of running `SHOW CREATE TABLE` on Delta tables.\n+ Fixed a bug affecting `Union` operation. \n* September 25, 2018 \n+ [[SPARK-25368]](https://issues.apache.org/jira/browse/SPARK-25368)[SQL] Incorrect constraint inference returns wrong result.\n+ [[SPARK-25402]](https://issues.apache.org/jira/browse/SPARK-25402)[SQL] Null handling in BooleanSimplification.\n+ Fixed `NotSerializableException` in Avro data source.\n* September 11, 2018 \n+ [[SPARK-25214]](https://issues.apache.org/jira/browse/SPARK-25214)[SS] Fix the issue that Kafka v2 source may return duplicated records when `failOnDataLoss=false`.\n+ [[SPARK-24987]](https://issues.apache.org/jira/browse/SPARK-24987)[SS] Fix Kafka consumer leak when no new offsets for articlePartition.\n+ Filter reduction should handle null value correctly.\n* August 28, 2018 \n+ Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.\n* August 23, 2018 \n+ Fixed NoClassDefError for Delta Snapshot\n+ [[SPARK-23935]](https://issues.apache.org/jira/browse/SPARK-23935)mapEntry throws `org.codehaus.commons.compiler.CompileException`.\n+ [[SPARK-24957]](https://issues.apache.org/jira/browse/SPARK-24957)[SQL] Average with decimal followed by aggregation returns wrong result. The incorrect results of AVERAGE might be returned. The CAST added in the Average operator will be bypassed if the result of Divide is the same type which it is casted to.\n+ [[SPARK-25081]](https://issues.apache.org/jira/browse/SPARK-25081)Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.\n+ Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.\n+ [[SPARK-25114]](https://issues.apache.org/jira/browse/SPARK-25114)Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX\\_VALUE.\n+ [[SPARK-25084]](https://issues.apache.org/jira/browse/SPARK-25084)\u201ddistribute by\u201d on multiple columns (wrap in brackets) may lead to codegen issue.\n+ [[SPARK-24934]](https://issues.apache.org/jira/browse/SPARK-24934)[SQL] Explicitly allowlist supported types in upper/lower bounds for in-memory partition pruning. When complex data types are used in query filters against cached data, Spark always returns an empty result set. The in-memory stats-based pruning generates incorrect results, because null is set for upper/lower bounds for complex types. The fix is to not use in-memory stats-based pruning for complex types.\n+ Fixed secret manager redaction when command partially succeed.\n+ Fixed nullable map issue in Parquet reader.\n* August 2, 2018 \n+ Added writeStream.table API in Python.\n+ Fixed an issue affecting Delta checkpointing.\n+ [[SPARK-24867]](https://issues.apache.org/jira/browse/SPARK-24867)[SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.\n+ Fixed an issue that could cause `mergeInto` command to produce incorrect results.\n+ Improved stability on accessing Azure Data Lake Storage Gen1.\n+ [[SPARK-24809]](https://issues.apache.org/jira/browse/SPARK-24809)Serializing LongHashedRelation in executor may result in data error.\n+ [[SPARK-24878]](https://issues.apache.org/jira/browse/SPARK-24878)[SQL] Fix reverse function for array type of primitive type containing null.\n* July 11, 2018 \n+ Fixed a bug in query execution that would cause aggregations on decimal columns with different precisions to return incorrect results in some cases.\n+ Fixed a `NullPointerException` bug that was thrown during advanced aggregation operations like grouping sets. \n### [Databricks Runtime 4.1 ML (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id48) \nSee [Databricks Runtime 4.1 ML (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/4.1ml.html). \n* July 31, 2018 \n+ Added Azure Synapse Analytics to ML Runtime 4.1\n+ Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.\n+ Fixed a bug affecting Spark SQL execution engine.\n+ Fixed a bug affecting code generation.\n+ Fixed a bug (`java.lang.NoClassDefFoundError`) affecting Delta Lake.\n+ Improved error handling in Delta Lake.\n+ Fixed a bug that caused incorrect data skipping statistics to be collected for string columns 32 characters or greater. \n### [Databricks Runtime 4.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id49) \nSee [Databricks Runtime 4.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/4.1.html). \n* January 8, 2019 \n+ [[SPARK-26366]](https://issues.apache.org/jira/browse/SPARK-26366)ReplaceExceptWithFilter should consider NULL as False.\n+ Delta Lake is enabled.\n* December 18, 2018 \n+ [[SPARK-25002]](https://issues.apache.org/jira/browse/SPARK-25002)Avro: revise the output record namespace.\n+ Fixed an issue affecting certain queries using Join and Limit.\n+ [[SPARK-26307]](https://issues.apache.org/jira/browse/SPARK-26307)Fixed CTAS when INSERT a partitioned table using Hive SerDe.\n+ Only ignore corrupt files after one or more retries when `spark.sql.files.ignoreCorruptFiles` or `spark.sql.files.ignoreMissingFiles` flag is enabled.\n+ Fixed an issue affecting installing Python Wheels in environments without Internet access.\n+ Fixed an issue in PySpark that caused DataFrame actions failed with \u201cconnection refused\u201d error.\n+ Fixed an issue affecting certain self union queries.\n* November 20, 2018 \n+ [[SPARK-17916]](https://issues.apache.org/jira/browse/SPARK-17916)[[SPARK-25241]](https://issues.apache.org/jira/browse/SPARK-25241)Fix empty string being parsed as null when nullValue is set.\n+ Fixed an issue affecting certain aggregation queries with Left Semi/Anti joins.\n* November 6, 2018 \n+ [[SPARK-25741]](https://issues.apache.org/jira/browse/SPARK-25741)Long URLs are not rendered properly in web UI.\n+ [[SPARK-25714]](https://issues.apache.org/jira/browse/SPARK-25714)Fix Null Handling in the Optimizer rule BooleanSimplification. \n* October 9, 2018 \n+ Fixed a bug affecting the output of running `SHOW CREATE TABLE` on Delta tables.\n+ Fixed a bug affecting `Union` operation. \n* September 25, 2018 \n+ [[SPARK-25368]](https://issues.apache.org/jira/browse/SPARK-25368)[SQL] Incorrect constraint inference returns wrong result.\n+ [[SPARK-25402]](https://issues.apache.org/jira/browse/SPARK-25402)[SQL] Null handling in BooleanSimplification.\n+ Fixed `NotSerializableException` in Avro data source.\n* September 11, 2018 \n+ [[SPARK-25214]](https://issues.apache.org/jira/browse/SPARK-25214)[SS] Fix the issue that Kafka v2 source may return duplicated records when `failOnDataLoss=false`.\n+ [[SPARK-24987]](https://issues.apache.org/jira/browse/SPARK-24987)[SS] Fix Kafka consumer leak when no new offsets for articlePartition.\n+ Filter reduction should handle null value correctly.\n* August 28, 2018 \n+ Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.\n+ [[SPARK-25084]](https://issues.apache.org/jira/browse/SPARK-25084)\u201ddistribute by\u201d on multiple columns (wrap in brackets) may lead to codegen issue.\n+ [[SPARK-25114]](https://issues.apache.org/jira/browse/SPARK-25114)Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX\\_VALUE.\n* August 23, 2018 \n+ Fixed NoClassDefError for Delta Snapshot.\n+ [[SPARK-24957]](https://issues.apache.org/jira/browse/SPARK-24957)[SQL] Average with decimal followed by aggregation returns wrong result. The incorrect results of AVERAGE might be returned. The CAST added in the Average operator will be bypassed if the result of Divide is the same type which it is casted to.\n+ Fixed nullable map issue in Parquet reader.\n+ [[SPARK-24934]](https://issues.apache.org/jira/browse/SPARK-24934)[SQL] Explicitly allowlist supported types in upper/lower bounds for in-memory partition pruning. When complex data types are used in query filters against cached data, Spark always returns an empty result set. The in-memory stats-based pruning generates incorrect results, because null is set for upper/lower bounds for complex types. The fix is to not use in-memory stats-based pruning for complex types.\n+ [[SPARK-25081]](https://issues.apache.org/jira/browse/SPARK-25081)Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.\n+ Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.\n+ Fixed secret manager redaction when command partially succeed\n* August 2, 2018 \n+ [[SPARK-24613]](https://issues.apache.org/jira/browse/SPARK-24613)[SQL] Cache with UDF could not be matched with subsequent dependent caches. Wraps the logical plan with a AnalysisBarrier for execution plan compilation in CacheManager, in order to avoid the plan being analyzed again. This is also a regression of Spark 2.3.\n+ Fixed a Synapse Analytics connector issue affecting timezone conversion for writing DateType data.\n+ Fixed an issue affecting Delta checkpointing.\n+ Fixed an issue that could cause `mergeInto` command to produce incorrect results.\n+ [[SPARK-24867]](https://issues.apache.org/jira/browse/SPARK-24867)[SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.\n+ [[SPARK-24809]](https://issues.apache.org/jira/browse/SPARK-24809)Serializing LongHashedRelation in executor may result in data error.\n* July 11, 2018 \n+ Fixed a bug in query execution that would cause aggregations on decimal columns with different precisions to return incorrect results in some cases.\n+ Fixed a `NullPointerException` bug that was thrown during advanced aggregation operations like grouping sets.\n* June 28, 2018 \n+ Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table. \n* May 29, 2018 \n+ Fixed a bug affecting Spark SQL execution engine.\n+ Fixed a bug affecting code generation.\n+ Fixed a bug (`java.lang.NoClassDefFoundError`) affecting Delta Lake.\n+ Improved error handling in Delta Lake.\n* May 15, 2018 \n+ Fixed a bug that caused incorrect data skipping statistics to be collected for string columns 32 characters or greater. \n### [Databricks Runtime 4.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id50) \nSee [Databricks Runtime 4.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/4.0.html). \n* November 6, 2018 \n+ [[SPARK-25714]](https://issues.apache.org/jira/browse/SPARK-25714)Fix Null Handling in the Optimizer rule BooleanSimplification. \n* October 9, 2018 \n+ Fixed a bug affecting `Union` operation. \n* September 25, 2018 \n+ [[SPARK-25368]](https://issues.apache.org/jira/browse/SPARK-25368)[SQL] Incorrect constraint inference returns wrong result.\n+ [[SPARK-25402]](https://issues.apache.org/jira/browse/SPARK-25402)[SQL] Null handling in BooleanSimplification.\n+ Fixed `NotSerializableException` in Avro data source.\n* September 11, 2018 \n+ Filter reduction should handle null value correctly.\n* August 28, 2018 \n+ Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.\n* August 23, 2018 \n+ Fixed nullable map issue in Parquet reader.\n+ Fixed secret manager redaction when command partially succeed\n+ Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.\n+ [[SPARK-25081]](https://issues.apache.org/jira/browse/SPARK-25081)Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.\n+ [[SPARK-25114]](https://issues.apache.org/jira/browse/SPARK-25114)Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX\\_VALUE.\n* August 2, 2018 \n+ [[SPARK-24452]](https://issues.apache.org/jira/browse/SPARK-24452)Avoid possible overflow in int add or multiple.\n+ [[SPARK-24588]](https://issues.apache.org/jira/browse/SPARK-24588)Streaming join should require HashClusteredPartitioning from children.\n+ Fixed an issue that could cause `mergeInto` command to produce incorrect results.\n+ [[SPARK-24867]](https://issues.apache.org/jira/browse/SPARK-24867)[SQL] Add AnalysisBarrier to DataFrameWriter. SQL cache is not being used when using DataFrameWriter to write a DataFrame with UDF. This is a regression caused by the changes we made in AnalysisBarrier, since not all the Analyzer rules are idempotent.\n+ [[SPARK-24809]](https://issues.apache.org/jira/browse/SPARK-24809)Serializing LongHashedRelation in executor may result in data error.\n* June 28, 2018 \n+ Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table. \n* May 31, 2018 \n+ Fixed a bug affecting Spark SQL execution engine.\n+ Improved error handling in Delta Lake. \n* May 17, 2018 \n+ Bug fixes for Databricks secret management.\n+ Improved stability on reading data stored in Azure Data Lake Store.\n+ Fixed a bug affecting RDD caching.\n+ Fixed a bug affecting Null-safe Equal in Spark SQL.\n* April 24, 2018 \n+ Upgraded Azure Data Lake Store SDK from 2.0.11 to 2.2.8 to improve the stability of access to Azure Data Lake Store.\n+ Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when `spark.databricks.io.hive.fastwriter.enabled` is `false`.\n+ Fixed an issue that failed task serialization.\n+ Improved Delta Lake stability.\n* March 14, 2018 \n+ Prevent unnecessary metadata updates when writing into Delta Lake.\n+ Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files. \n### [Databricks Runtime 3.5 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id51) \nSee [Databricks Runtime 3.5 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/3.5.html). \n* November 7, 2019 \n+ [[SPARK-29743]](https://issues.apache.org/jira/browse/SPARK-29743)[SQL] sample should set needCopyResult to true if its child\u2019s needCopyResult is true\n* October 8, 2019 \n+ Server side changes to allow Simba Apache Spark ODBC driver to reconnect and continue after a connection failure during fetching results (requires Simba Apache Spark ODBC driver update to version 2.6.10).\n* September 10, 2019 \n+ [[SPARK-28699]](https://issues.apache.org/jira/browse/SPARK-28699)[SQL] Disable using radix sort for ShuffleExchangeExec in repartition case\n* April 9, 2019 \n+ [[SPARK-26665]](https://issues.apache.org/jira/browse/SPARK-26665)[CORE] Fix a bug that can cause BlockTransferService.fetchBlockSync to hang forever.\n* February 12, 2019 \n+ Fixed an issue that Spark low level network protocol may be broken when sending large RPC error messages with encryption enabled (in [HIPAA compliance features](https://docs.databricks.com/security/privacy/hipaa.html)) or when `spark.network.crypto.enabled` is set to true).\n* January 30, 2019 \n+ Fixed an issue that can cause `df.rdd.count()` with UDT to return incorrect answer for certain cases.\n* December 18, 2018 \n+ Only ignore corrupt files after one or more retries when `spark.sql.files.ignoreCorruptFiles` or `spark.sql.files.ignoreMissingFiles` flag is enabled.\n+ Fixed an issue affecting certain self union queries.\n* November 20, 2018 \n+ [[SPARK-25816]](https://issues.apache.org/jira/browse/SPARK-25816)Fixed attribute resolution in nested extractors.\n* November 6, 2018 \n+ [[SPARK-25714]](https://issues.apache.org/jira/browse/SPARK-25714)Fix Null Handling in the Optimizer rule BooleanSimplification. \n* October 9, 2018 \n+ Fixed a bug affecting `Union` operation. \n* September 25, 2018 \n+ [[SPARK-25402]](https://issues.apache.org/jira/browse/SPARK-25402)[SQL] Null handling in BooleanSimplification.\n+ Fixed `NotSerializableException` in Avro data source.\n* September 11, 2018 \n+ Filter reduction should handle null value correctly.\n* August 28, 2018 \n+ Fixed a bug in Delta Lake Delete command that would incorrectly delete the rows where the condition evaluates to null.\n+ [[SPARK-25114]](https://issues.apache.org/jira/browse/SPARK-25114)Fix RecordBinaryComparator when subtraction between two words is divisible by Integer.MAX\\_VALUE.\n* August 23, 2018 \n+ [[SPARK-24809]](https://issues.apache.org/jira/browse/SPARK-24809)Serializing LongHashedRelation in executor may result in data error.\n+ Fixed nullable map issue in Parquet reader.\n+ [[SPARK-25081]](https://issues.apache.org/jira/browse/SPARK-25081)Fixed a bug where ShuffleExternalSorter may access a released memory page when spilling fails to allocate memory.\n+ Fixed an interaction between Databricks Delta and Pyspark which could cause transient read failures.\n* June 28, 2018 \n+ Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table.\n* June 28, 2018 \n+ Fixed a bug that could cause incorrect query results when the name of a partition column used in a predicate differs from the case of that column in the schema of the table. \n* May 31, 2018 \n+ Fixed a bug affecting Spark SQL execution engine.\n+ Improved error handling in Delta Lake. \n* May 17, 2018 \n+ Improved stability on reading data stored in Azure Data Lake Store.\n+ Fixed a bug affecting RDD caching.\n+ Fixed a bug affecting Null-safe Equal in Spark SQL.\n+ Fixed a bug affecting certain aggregations in streaming queries.\n* April 24, 2018 \n+ Upgraded Azure Data Lake Store SDK from 2.0.11 to 2.2.8 to improve the stability of access to Azure Data Lake Store.\n+ Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when `spark.databricks.io.hive.fastwriter.enabled` is `false`.\n+ Fixed an issue that failed task serialization.\n* March 09, 2018 \n+ Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.\n* March 01, 2018 \n+ Improved the efficiency of handling streams that can take a long time to stop.\n+ Fixed an issue affecting Python autocomplete.\n+ Applied Ubuntu security patches.\n+ Fixed an issue affecting certain queries using Python UDFs and window functions.\n+ Fixed an issue affecting the use of UDFs on a cluster with table access control enabled.\n* January 29, 2018 \n+ Fixed an issue affecting the manipulation of tables stored in Azure Blob storage.\n+ Fixed aggregation after dropDuplicates on empty DataFrame. \n### [Databricks Runtime 3.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id52) \nSee [Databricks Runtime 3.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/3.4.html). \n* May 31, 2018 \n+ Fixed a bug affecting Spark SQL execution engine.\n+ Improved error handling in Delta Lake. \n* May 17, 2018 \n+ Improved stability on reading data stored in Azure Data Lake Store.\n+ Fixed a bug affecting RDD caching.\n+ Fixed a bug affecting Null-safe Equal in Spark SQL.\n* April 24, 2018 \n+ Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when `spark.databricks.io.hive.fastwriter.enabled` is `false`.\n* March 09, 2018 \n+ Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.\n* December 13, 2017 \n+ Fixed an issue affecting UDFs in Scala.\n+ Fixed an issue affecting the use of Data Skipping Index on data source tables stored in non-DBFS paths.\n* December 07, 2017 \n+ Improved shuffle stability. \n### [Databricks Runtime 3.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id53) \nSee [Databricks Runtime 3.3 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/3.3.html). \n* May 31, 2018 \n+ Fixed a bug affecting Spark SQL execution engine.\n* April 24, 2018 \n+ Fixed a bug affecting the insertion of overwrites to partitioned Hive tables when `spark.databricks.io.hive.fastwriter.enabled` is `false`.\n* March 12, 2018 \n+ Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.\n* January 29, 2018 \n+ Fixed an issue affecting UDFs in Scala.\n* October 11, 2017 \n+ Improved shuffle stability. \n### [Databricks Runtime 3.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id54) \nSee [Databricks Runtime 3.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/3.2.html). \n* March 30, 2018 \n+ Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.\n* September 13, 2017 \n+ Fixed an issue affecting the use of `spark_submit_task` with Databricks jobs.\n* September 06, 2017 \n+ Fixed an issue affecting the performance of certain window functions. \n### [2.1.1-db6 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#id55) \nSee [2.1.1-db6 cluster image (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/cluster-images/2.1.1-db6.html). \n* May 31, 2018 \n+ Fixed a bug affecting Spark SQL execution engine.\n* March 30, 2018 \n+ Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files. \n#### 2.1.1-db4 (unsupported) \nSee [2.1.1-db4 cluster image (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/cluster-images/2.1.1-db4.html). \n* May 31, 2018 \n+ Fixed a bug affecting Spark SQL execution engine.\n* March 30, 2018 \n+ Fixed an issue caused by a race condition that could, in rare circumstances, lead to loss of some output files.\n\n", "chunk_id": "fdb6920985e20f1df458b27625deb9dd", "url": "https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Maintenance updates for Databricks Runtime (archived)\n##### Unsupported Databricks Runtime releases\n\nFor the original release notes, follow the link below the subheading.\n\n", "chunk_id": "f1237b8cdab9e1bb62dcee565775ae8b", "url": "https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 7.3 LTS for Machine Learning (unsupported)\n\nDatabricks released this image in September 2020. It was declared Long Term Support (LTS) in October 2020. \nDatabricks Runtime 7.3 LTS for Machine Learning provides a ready-to-go environment for machine learning and data science based on [Databricks Runtime 7.3 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts.html).\nDatabricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost.\nIt also supports distributed deep learning training using Horovod. \nFor more information, including instructions for creating a Databricks Runtime ML cluster, see [AI and Machine Learning on Databricks](https://docs.databricks.com/machine-learning/index.html). \nFor help with migration from Databricks Runtime 6.x, see [Databricks Runtime 7.x migration guide (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.x-migration.html).\n\n", "chunk_id": "08bfa75829b666bff23e27e5439516a8", "url": "https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 7.3 LTS for Machine Learning (unsupported)\n##### New features and major changes\n\nDatabricks Runtime 7.3 LTS for Machine Learning is built on top of Databricks Runtime 7.3 LTS. For information on what\u2019s new in Databricks Runtime 7.3 LTS, including Apache Spark MLlib and SparkR,\nsee the [Databricks Runtime 7.3 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts.html) release notes. \n### Major changes to Databricks Runtime ML Python environment \n#### Conda activation on workers \nPreviously, when you updated the notebook environment using `%conda`, the new environment was not activated on worker Python processes. This caused issues if a PySpark UDF function called a third-party function that used resources installed inside the Conda environment. This limitation does not exist any more. \nYou should also review the major changes to the Databricks Runtime Python environment in [Databricks Runtime 7.3 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts.html).\nFor a full list of installed Python packages and their versions, see [Python libraries](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html#python-libraries). \n#### Python packages upgraded \n* mlflow 1.9.1 -> 1.11.0\n* tensorflow 2.2.0 -> 2.3.0\n* tensorboard 2.2.2 -> 2.3.0\n* pytorch 1.5.1 -> 1.6.0\n* torchvision 0.6.1 -> 0.7.0\n* petastorm 0.9.2 -> 0.9.5\n\n", "chunk_id": "9651f212bc7510c420f2e33fe0146bc2", "url": "https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 7.3 LTS for Machine Learning (unsupported)\n##### System environment\n\nThe system environment in Databricks Runtime 7.3 LTS for Machine Learning differs from Databricks Runtime 7.3 LTS as follows: \n* **DBUtils**: Databricks Runtime ML does not contain [Library utility (dbutils.library) (legacy)](https://docs.databricks.com/archive/dev-tools/dbutils-library.html).\nYou can use `%pip` and `%conda` commands instead. See [Notebook-scoped Python libraries](https://docs.databricks.com/libraries/notebooks-python-libraries.html).\n* For GPU clusters, Databricks Runtime ML includes the following NVIDIA GPU libraries: \n+ CUDA 10.1 Update 2\n+ cuDNN 7.6.5\n+ NCCL 2.7.3\n+ TensorRT 6.0.1\n\n", "chunk_id": "8dfbc0ad352fd2e51bfac89a4d691f9a", "url": "https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 7.3 LTS for Machine Learning (unsupported)\n##### Libraries\n\nThe following sections list the libraries included in Databricks Runtime 7.3 LTS for Machine Learning that differ from those\nincluded in Databricks Runtime 7.3 LTS. \nIn this section: \n* [Top-tier libraries](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html#top-tier-libraries)\n* [Python libraries](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html#python-libraries)\n* [R libraries](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html#r-libraries)\n* [Java and Scala libraries (Scala 2.12 cluster)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html#java-and-scala-libraries-scala-212-cluster) \n### [Top-tier libraries](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html#id1) \nDatabricks Runtime 7.3 LTS for Machine Learning includes the following top-tier [libraries](https://docs.databricks.com/machine-learning/index.html): \n* [GraphFrames](https://docs.databricks.com/integrations/graphframes/index.html)\n* [Horovod and HorovodRunner](https://docs.databricks.com/machine-learning/train-model/distributed-training/index.html)\n* [MLflow](https://docs.databricks.com/mlflow/index.html)\n* [PyTorch](https://docs.databricks.com/machine-learning/train-model/pytorch.html)\n* [spark-tensorflow-connector](https://docs.databricks.com/machine-learning/load-data/tfrecords-save-load.html#df-to-tfrecord)\n* [TensorFlow](https://docs.databricks.com/machine-learning/train-model/tensorflow.html)\n* [TensorBoard](https://docs.databricks.com/machine-learning/train-model/tensorboard.html) \n### [Python libraries](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html#id2) \nDatabricks Runtime 7.3 LTS for Machine Learning uses Conda for Python package management and includes many popular ML packages. \nIn addition to the packages specified in the Conda environments in the following sections, Databricks Runtime 7.3 LTS for Machine Learning also installs the following packages: \n* hyperopt 0.2.4.db2\n* sparkdl 2.1.0-db1 \n#### Python libraries on CPU clusters \n```\nname: databricks-ml\nchannels:\n- pytorch\n- defaults\ndependencies:\n- _libgcc_mutex=0.1=main\n- absl-py=0.9.0=py37_0\n- asn1crypto=1.3.0=py37_1\n- astor=0.8.0=py37_0\n- backcall=0.1.0=py37_0\n- backports=1.0=py_2\n- bcrypt=3.2.0=py37h7b6447c_0\n- blas=1.0=mkl\n- blinker=1.4=py37_0\n- boto3=1.12.0=py_0\n- botocore=1.15.0=py_0\n- c-ares=1.15.0=h7b6447c_1001\n- ca-certificates=2020.7.22=0\n- cachetools=4.1.1=py_0\n- certifi=2020.6.20=pyhd3eb1b0_3 # (updated from py37_0 in June 15, 2021 maintenance update)\n- cffi=1.14.0=py37he30daa8_1 # (updated from py37h2e261b9_0 in June 15, 2021 maintenance update)\n- chardet=3.0.4=py37_1003\n- click=7.0=py37_0\n- cloudpickle=1.3.0=py_0\n- configparser=3.7.4=py37_0\n- cpuonly=1.0=0\n- cryptography=2.8=py37h1ba5d50_0\n- cycler=0.10.0=py37_0\n- cython=0.29.15=py37he6710b0_0\n- decorator=4.4.1=py_0\n- dill=0.3.1.1=py37_1\n- docutils=0.15.2=py37_0\n- entrypoints=0.3=py37_0\n- flask=1.1.1=py_1\n- freetype=2.9.1=h8a8886c_1\n- future=0.18.2=py37_1\n- gast=0.3.3=py_0\n- gitdb=4.0.5=py_0\n- gitpython=3.1.0=py_0\n- google-auth=1.11.2=py_0\n- google-auth-oauthlib=0.4.1=py_2\n- google-pasta=0.2.0=py_0\n- grpcio=1.27.2=py37hf8bcb03_0\n- gunicorn=20.0.4=py37_0\n- h5py=2.10.0=py37h7918eee_0\n- hdf5=1.10.4=hb1b8bf9_0\n- icu=58.2=he6710b0_3\n- idna=2.8=py37_0\n- intel-openmp=2020.0=166\n- ipykernel=5.1.4=py37h39e3cac_0\n- ipython=7.12.0=py37h5ca1d4c_0\n- ipython_genutils=0.2.0=py37_0\n- isodate=0.6.0=py_1\n- itsdangerous=1.1.0=py37_0\n- jedi=0.14.1=py37_0\n- jinja2=2.11.1=py_0\n- jmespath=0.10.0=py_0\n- joblib=0.14.1=py_0\n- jpeg=9b=h024ee3a_2\n- jupyter_client=5.3.4=py37_0\n- jupyter_core=4.6.1=py37_0\n- kiwisolver=1.1.0=py37he6710b0_0\n- krb5=1.17.1=h173b8e3_0 # (updated from 1.16.4 in June 15, 2021 maintenance update)\n- ld_impl_linux-64=2.33.1=h53a641e_7\n- libedit=3.1.20181209=hc058e9b_0\n- libffi=3.3=he6710b0_2 # (updated from 3.2.1 in June 15, 2021 maintenance update)\n- libgcc-ng=9.1.0=hdf63c60_0\n- libgfortran-ng=7.3.0=hdf63c60_0\n- libpng=1.6.37=hbc83047_0\n- libpq=12.2=h20c2e04_0 # (updated from 11.2 in June 15, 2021 maintenance update)\n- libprotobuf=3.11.4=hd408876_0\n- libsodium=1.0.16=h1bed415_0\n- libstdcxx-ng=9.1.0=hdf63c60_0\n- libtiff=4.1.0=h2733197_0\n- lightgbm=2.3.0=py37he6710b0_0\n- lz4-c=1.8.1.2=h14c3975_0\n- mako=1.1.2=py_0\n- markdown=3.1.1=py37_0\n- markupsafe=1.1.1=py37h14c3975_1\n- matplotlib-base=3.1.3=py37hef1b27d_0\n- mkl=2020.0=166\n- mkl-service=2.3.0=py37he904b0f_0\n- mkl_fft=1.0.15=py37ha843d7b_0\n- mkl_random=1.1.0=py37hd6b4f25_0\n- ncurses=6.2=he6710b0_1\n- networkx=2.4=py_1\n- ninja=1.10.0=py37hfd86e86_0\n- nltk=3.4.5=py37_0\n- numpy=1.18.1=py37h4f9e942_0\n- numpy-base=1.18.1=py37hde5b4d6_1\n- oauthlib=3.1.0=py_0\n- olefile=0.46=py37_0\n- openssl=1.1.1k=h27cfd23_0 # (updated from 1.1.1g in June 15, 2021 maintenance update)\n- packaging=20.1=py_0\n- pandas=1.0.1=py37h0573a6f_0\n- paramiko=2.7.1=py_0\n- parso=0.5.2=py_0\n- patsy=0.5.1=py37_0\n- pexpect=4.8.0=py37_1\n- pickleshare=0.7.5=py37_1001\n- pillow=7.0.0=py37hb39fc2d_0\n- pip=20.0.2=py37_3\n- plotly=4.9.0=py_0\n- prompt_toolkit=3.0.3=py_0\n- protobuf=3.11.4=py37he6710b0_0\n- psutil=5.6.7=py37h7b6447c_0\n- psycopg2=2.8.6=py37h3c74f83_1 # (updated from 2.8.4 in June 15, 2021 maintenance update)\n- ptyprocess=0.6.0=py37_0\n- pyasn1=0.4.8=py_0\n- pyasn1-modules=0.2.7=py_0\n- pycparser=2.19=py37_0\n- pygments=2.5.2=py_0\n- pyjwt=1.7.1=py37_0\n- pynacl=1.3.0=py37h7b6447c_0\n- pyodbc=4.0.30=py37he6710b0_0\n- pyopenssl=19.1.0=py_1\n- pyparsing=2.4.6=py_0\n- pysocks=1.7.1=py37_1\n- python=3.7.10=hdb3f193_0 # (updated from 3.7.6 in June 15, 2021 maintenance update)\n- python-dateutil=2.8.1=py_0\n- python-editor=1.0.4=py_0\n- pytorch=1.6.0=py3.7_cpu_0\n- pytz=2019.3=py_0\n- pyzmq=18.1.1=py37he6710b0_0\n- readline=8.1=h27cfd23_0 # (updated from 7.0 in June 15, 2021 maintenance update)\n- requests=2.22.0=py37_1\n- requests-oauthlib=1.3.0=py_0\n- retrying=1.3.3=py37_2\n- rsa=4.0=py_0\n- s3transfer=0.3.3=py37_1\n- scikit-learn=0.22.1=py37hd81dba3_0\n- scipy=1.4.1=py37h0b6359f_0\n- setuptools=45.2.0=py37_0\n- simplejson=3.17.0=py37h7b6447c_0\n- six=1.14.0=py37_0\n- smmap=3.0.4=py_0\n- sqlite=3.35.4=hdfb4753_0 # (updated from 3.31.1 in June 15, 2021 maintenance update)\n- sqlparse=0.3.0=py_0\n- statsmodels=0.11.0=py37h7b6447c_0\n- tabulate=0.8.3=py37_0\n- tk=8.6.10=hbc83047_0 # (updated from 8.6.8 in June 15, 2021 maintenance update)\n- torchvision=0.7.0=py37_cpu\n- tornado=6.0.3=py37h7b6447c_3\n- tqdm=4.42.1=py_0\n- traitlets=4.3.3=py37_0\n- unixodbc=2.3.7=h14c3975_0\n- urllib3=1.25.8=py37_0\n- wcwidth=0.1.8=py_0\n- websocket-client=0.56.0=py37_0\n- werkzeug=1.0.0=py_0\n- wheel=0.34.2=py37_0\n- wrapt=1.11.2=py37h7b6447c_0\n- xz=5.2.5=h7b6447c_0 # (updated from 5.2.4 in June 15, 2021 maintenance update)\n- zeromq=4.3.1=he6710b0_3\n- zlib=1.2.11=h7b6447c_3\n- zstd=1.3.7=h0b5b093_0\n- pip:\n- astunparse==1.6.3\n- azure-core==1.8.0\n- azure-storage-blob==12.4.0\n- databricks-cli==0.11.0\n- diskcache==5.0.2\n- docker==4.3.1\n- gorilla==0.3.0\n- horovod==0.19.5\n- joblibspark==0.2.0\n- keras-preprocessing==1.1.2\n- koalas==1.2.0\n- mleap==0.16.1\n- mlflow==1.11.0\n- msrest==0.6.18\n- opt-einsum==3.3.0\n- petastorm==0.9.5\n- pyarrow==1.0.1\n- pyyaml==5.3.1\n- querystring-parser==1.2.4\n- seaborn==0.10.0\n- spark-tensorflow-distributor==0.1.0\n- tensorboard==2.3.0\n- tensorboard-plugin-wit==1.7.0\n- tensorflow-cpu==2.3.0\n- tensorflow-estimator==2.3.0\n- termcolor==1.1.0\n- xgboost==1.1.1\nprefix: /databricks/conda/envs/databricks-ml\n\n``` \n#### Python libraries on GPU clusters \n```\nname: databricks-ml-gpu\nchannels:\n- pytorch\n- defaults\ndependencies:\n- _libgcc_mutex=0.1=main\n- absl-py=0.9.0=py37_0\n- asn1crypto=1.3.0=py37_1\n- astor=0.8.0=py37_0\n- backcall=0.1.0=py37_0\n- backports=1.0=py_2\n- bcrypt=3.2.0=py37h7b6447c_0\n- blas=1.0=mkl\n- blinker=1.4=py37_0\n- boto3=1.12.0=py_0\n- botocore=1.15.0=py_0\n- c-ares=1.15.0=h7b6447c_1001\n- ca-certificates=2020.7.22=0\n- cachetools=4.1.1=py_0\n- certifi=2020.6.20=pyhd3eb1b0_3 # (updated from py37_0 in June 15, 2021 maintenance update)\n- cffi=1.14.0=py37he30daa8_1 # (updated from py37h2e261b9_0 in June 15, 2021 maintenance update)\n- chardet=3.0.4=py37_1003\n- click=7.0=py37_0\n- cloudpickle=1.3.0=py_0\n- configparser=3.7.4=py37_0\n- cryptography=2.8=py37h1ba5d50_0\n- cudatoolkit=10.1.243=h6bb024c_0\n- cycler=0.10.0=py37_0\n- cython=0.29.15=py37he6710b0_0\n- decorator=4.4.1=py_0\n- dill=0.3.1.1=py37_1\n- docutils=0.15.2=py37_0\n- entrypoints=0.3=py37_0\n- flask=1.1.1=py_1\n- freetype=2.9.1=h8a8886c_1\n- future=0.18.2=py37_1\n- gast=0.3.3=py_0\n- gitdb=4.0.5=py_0\n- gitpython=3.1.0=py_0\n- google-auth=1.11.2=py_0\n- google-auth-oauthlib=0.4.1=py_2\n- google-pasta=0.2.0=py_0\n- grpcio=1.27.2=py37hf8bcb03_0\n- gunicorn=20.0.4=py37_0\n- h5py=2.10.0=py37h7918eee_0\n- hdf5=1.10.4=hb1b8bf9_0\n- icu=58.2=he6710b0_3\n- idna=2.8=py37_0\n- intel-openmp=2020.0=166\n- ipykernel=5.1.4=py37h39e3cac_0\n- ipython=7.12.0=py37h5ca1d4c_0\n- ipython_genutils=0.2.0=py37_0\n- isodate=0.6.0=py_1\n- itsdangerous=1.1.0=py37_0\n- jedi=0.14.1=py37_0\n- jinja2=2.11.1=py_0\n- jmespath=0.10.0=py_0\n- joblib=0.14.1=py_0\n- jpeg=9b=h024ee3a_2\n- jupyter_client=5.3.4=py37_0\n- jupyter_core=4.6.1=py37_0\n- kiwisolver=1.1.0=py37he6710b0_0\n- krb5=1.16.4=h173b8e3_0 # (updated from 1.16.4 in June 15, 2021 maintenance update)\n- ld_impl_linux-64=2.33.1=h53a641e_7\n- libedit=3.1.20181209=hc058e9b_0\n- libffi=3.3=he6710b0_2 # (updated from 3.2.1 in June 15, 2021 maintenance update)\n- libgcc-ng=9.1.0=hdf63c60_0\n- libgfortran-ng=7.3.0=hdf63c60_0\n- libpng=1.6.37=hbc83047_0\n- libpq=12.2=h20c2e04_0 # (updated from 11.2 in June 15, 2021 maintenance update)\n- libprotobuf=3.11.4=hd408876_0\n- libsodium=1.0.16=h1bed415_0\n- libstdcxx-ng=9.1.0=hdf63c60_0\n- libtiff=4.1.0=h2733197_0\n- lightgbm=2.3.0=py37he6710b0_0\n- lz4-c=1.8.1.2=h14c3975_0\n- mako=1.1.2=py_0\n- markdown=3.1.1=py37_0\n- markupsafe=1.1.1=py37h14c3975_1\n- matplotlib-base=3.1.3=py37hef1b27d_0\n- mkl=2020.0=166\n- mkl-service=2.3.0=py37he904b0f_0\n- mkl_fft=1.0.15=py37ha843d7b_0\n- mkl_random=1.1.0=py37hd6b4f25_0\n- ncurses=6.2=he6710b0_1\n- networkx=2.4=py_1\n- ninja=1.10.0=py37hfd86e86_0\n- nltk=3.4.5=py37_0\n- numpy=1.18.1=py37h4f9e942_0\n- numpy-base=1.18.1=py37hde5b4d6_1\n- oauthlib=3.1.0=py_0\n- olefile=0.46=py37_0\n- openssl=1.1.1k=h27cfd23_0 # (updated from 1.1.1g in June 15, 2021 maintenance update)\n- packaging=20.1=py_0\n- pandas=1.0.1=py37h0573a6f_0\n- paramiko=2.7.1=py_0\n- parso=0.5.2=py_0\n- patsy=0.5.1=py37_0\n- pexpect=4.8.0=py37_1\n- pickleshare=0.7.5=py37_1001\n- pillow=7.0.0=py37hb39fc2d_0\n- pip=20.0.2=py37_3\n- plotly=4.9.0=py_0\n- prompt_toolkit=3.0.3=py_0\n- protobuf=3.11.4=py37he6710b0_0\n- psutil=5.6.7=py37h7b6447c_0\n- psycopg2=2.8.6=py37h3c74f83_1 # (updated from 2.8.4 in June 15, 2021 maintenance update)\n- ptyprocess=0.6.0=py37_0\n- pyasn1=0.4.8=py_0\n- pyasn1-modules=0.2.7=py_0\n- pycparser=2.19=py37_0\n- pygments=2.5.2=py_0\n- pyjwt=1.7.1=py37_0\n- pynacl=1.3.0=py37h7b6447c_0\n- pyodbc=4.0.30=py37he6710b0_0\n- pyopenssl=19.1.0=py_1\n- pyparsing=2.4.6=py_0\n- pysocks=1.7.1=py37_1\n- python=3.7.10=hdb3f193_0 # (updated from 3.7.6 in June 15, 2021 maintenance update)\n- python-dateutil=2.8.1=py_0\n- python-editor=1.0.4=py_0\n- pytorch=1.6.0=py3.7_cuda10.1.243_cudnn7.6.3_0\n- pytz=2019.3=py_0\n- pyzmq=18.1.1=py37he6710b0_0\n- readline=8.1=h27cfd23_0 # (updated from 7.0 in June 15, 2021 maintenance update)\n- requests=2.22.0=py37_1\n- requests-oauthlib=1.3.0=py_0\n- retrying=1.3.3=py37_2\n- rsa=4.0=py_0\n- s3transfer=0.3.3=py37_1\n- scikit-learn=0.22.1=py37hd81dba3_0\n- scipy=1.4.1=py37h0b6359f_0\n- setuptools=45.2.0=py37_0\n- simplejson=3.17.0=py37h7b6447c_0\n- six=1.14.0=py37_0\n- smmap=3.0.4=py_0\n- sqlite=3.35.4=hdfb4753_0 # (updated from 3.31.1 in June 15, 2021 maintenance update)\n- sqlparse=0.3.0=py_0\n- statsmodels=0.11.0=py37h7b6447c_0\n- tabulate=0.8.3=py37_0\n- tk=8.6.10=hbc83047_0 # (updated from 8.6.8 in June 15, 2021 maintenance update)\n- torchvision=0.7.0=py37_cu101\n- tornado=6.0.3=py37h7b6447c_3\n- tqdm=4.42.1=py_0\n- traitlets=4.3.3=py37_0\n- unixodbc=2.3.7=h14c3975_0\n- urllib3=1.25.8=py37_0\n- wcwidth=0.1.8=py_0\n- websocket-client=0.56.0=py37_0\n- werkzeug=1.0.0=py_0\n- wheel=0.34.2=py37_0\n- wrapt=1.11.2=py37h7b6447c_0\n- xz=5.2.5=h7b6447c_0 # (updated from 5.2.4 in June 15, 2021 maintenance update)\n- zeromq=4.3.1=he6710b0_3\n- zlib=1.2.11=h7b6447c_3\n- zstd=1.3.7=h0b5b093_0\n- pip:\n- astunparse==1.6.3\n- azure-core==1.8.0\n- azure-storage-blob==12.4.0\n- databricks-cli==0.11.0\n- diskcache==5.0.2\n- docker==4.3.1\n- gorilla==0.3.0\n- horovod==0.19.5\n- joblibspark==0.2.0\n- keras-preprocessing==1.1.2\n- koalas==1.2.0\n- mleap==0.16.1\n- mlflow==1.11.0\n- msrest==0.6.18\n- opt-einsum==3.3.0\n- petastorm==0.9.5\n- pyarrow==1.0.1\n- pyyaml==5.3.1\n- querystring-parser==1.2.4\n- seaborn==0.10.0\n- spark-tensorflow-distributor==0.1.0\n- tensorboard==2.3.0\n- tensorboard-plugin-wit==1.7.0\n- tensorflow==2.3.0\n- tensorflow-estimator==2.3.0\n- termcolor==1.1.0\n- xgboost==1.1.1\nprefix: /databricks/conda/envs/databricks-ml-gpu\n\n``` \n#### Spark packages containing Python modules \n| Spark Package | Python Module | Version |\n| --- | --- | --- |\n| graphframes | graphframes | 0.8.0-db2-spark3.0 | \n### [R libraries](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html#id3) \nThe R libraries are identical to the [R Libraries](https://docs.databricks.com/archive/runtime-release-notes/7.3lts.html#rlibraries) in Databricks Runtime 7.3 LTS. \n### [Java and Scala libraries (Scala 2.12 cluster)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html#id4) \nIn addition to Java and Scala libraries in Databricks Runtime 7.3 LTS, Databricks Runtime 7.3 LTS for Machine Learning contains the following JARs: \n| Group ID | Artifact ID | Version |\n| --- | --- | --- |\n| com.typesafe.akka | akka-actor\\_2.12 | 2.5.23 |\n| ml.combust.mleap | mleap-databricks-runtime\\_2.12 | 0.17.3-4882dc3 |\n| ml.dmlc | xgboost4j-spark\\_2.12 | 1.0.0 |\n| ml.dmlc | xgboost4j\\_2.12 | 1.0.0 |\n| org.mlflow | mlflow-client | 1.11.0 |\n| org.scala-lang.modules | scala-java8-compat\\_2.12 | 0.8.0 |\n| org.tensorflow | spark-tensorflow-connector\\_2.12 | 1.15.0 |\n\n", "chunk_id": "afe71a40efb70335eae1ee271ff0c1cf", "url": "https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html"} +{"chunked_text": "# Develop on Databricks\n## What are user-defined functions (UDFs)?\n#### User-defined aggregate functions - Scala\n\nThis article contains an example of a UDAF and how to register it for use in Apache Spark SQL. See [User-defined aggregate functions (UDAFs)](https://docs.databricks.com/sql/language-manual/sql-ref-functions-udf-aggregate.html) for more details.\n\n", "chunk_id": "8b1bd64da5dd605ee0b1933e6e0635f5", "url": "https://docs.databricks.com/udf/aggregate-scala.html"} +{"chunked_text": "# Develop on Databricks\n## What are user-defined functions (UDFs)?\n#### User-defined aggregate functions - Scala\n##### Implement a `UserDefinedAggregateFunction`\n\n```\nimport org.apache.spark.sql.expressions.MutableAggregationBuffer\nimport org.apache.spark.sql.expressions.UserDefinedAggregateFunction\nimport org.apache.spark.sql.Row\nimport org.apache.spark.sql.types._\n\nclass GeometricMean extends UserDefinedAggregateFunction {\n// This is the input fields for your aggregate function.\noverride def inputSchema: org.apache.spark.sql.types.StructType =\nStructType(StructField(\"value\", DoubleType) :: Nil)\n\n// This is the internal fields you keep for computing your aggregate.\noverride def bufferSchema: StructType = StructType(\nStructField(\"count\", LongType) ::\nStructField(\"product\", DoubleType) :: Nil\n)\n\n// This is the output type of your aggregatation function.\noverride def dataType: DataType = DoubleType\n\noverride def deterministic: Boolean = true\n\n// This is the initial value for your buffer schema.\noverride def initialize(buffer: MutableAggregationBuffer): Unit = {\nbuffer(0) = 0L\nbuffer(1) = 1.0\n}\n\n// This is how to update your buffer schema given an input.\noverride def update(buffer: MutableAggregationBuffer, input: Row): Unit = {\nbuffer(0) = buffer.getAs[Long](0) + 1\nbuffer(1) = buffer.getAs[Double](1) * input.getAs[Double](0)\n}\n\n// This is how to merge two objects with the bufferSchema type.\noverride def merge(buffer1: MutableAggregationBuffer, buffer2: Row): Unit = {\nbuffer1(0) = buffer1.getAs[Long](0) + buffer2.getAs[Long](0)\nbuffer1(1) = buffer1.getAs[Double](1) * buffer2.getAs[Double](1)\n}\n\n// This is where you output the final value, given the final value of your bufferSchema.\noverride def evaluate(buffer: Row): Any = {\nmath.pow(buffer.getDouble(1), 1.toDouble / buffer.getLong(0))\n}\n}\n\n```\n\n", "chunk_id": "7e247040ff588be7528c28891c6cd48c", "url": "https://docs.databricks.com/udf/aggregate-scala.html"} +{"chunked_text": "# Develop on Databricks\n## What are user-defined functions (UDFs)?\n#### User-defined aggregate functions - Scala\n##### Register the UDAF with Spark SQL\n\n```\nspark.udf.register(\"gm\", new GeometricMean)\n\n```\n\n#### User-defined aggregate functions - Scala\n##### Use your UDAF\n\n```\n// Create a DataFrame and Spark SQL table\nimport org.apache.spark.sql.functions._\n\nval ids = spark.range(1, 20)\nids.createOrReplaceTempView(\"ids\")\nval df = spark.sql(\"select id, id % 3 as group_id from ids\")\ndf.createOrReplaceTempView(\"simple\")\n\n``` \n```\n-- Use a group_by statement and call the UDAF.\nselect group_id, gm(id) from simple group by group_id\n\n``` \n```\n// Or use DataFrame syntax to call the aggregate function.\n\n// Create an instance of UDAF GeometricMean.\nval gm = new GeometricMean\n\n// Show the geometric mean of values of column \"id\".\ndf.groupBy(\"group_id\").agg(gm(col(\"id\")).as(\"GeometricMean\")).show()\n\n// Invoke the UDAF by its assigned name.\ndf.groupBy(\"group_id\").agg(expr(\"gm(id) as GeometricMean\")).show()\n\n```\n\n", "chunk_id": "5a11807945d32e3e34f8015fdbd4a0a5", "url": "https://docs.databricks.com/udf/aggregate-scala.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n", "chunk_id": "588689b77102abf1c5a83cc9756ee1d6", "url": "https://docs.databricks.com/dev-tools/visual-studio-code.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### How can I use Visual Studio Code with Databricks?\n\n[Visual Studio Code](https://code.visualstudio.com/docs) by Microsoft is a lightweight but powerful source code editor which runs on your desktop and is available for Windows, macOS, and Linux. It comes with built-in support for JavaScript, TypeScript and Node.js and has a rich ecosystem of extensions for other languages and runtimes (such as C++, C#, Java, Python, PHP, Go, and .NET). Visual Studio Code combines the simplicity of a source code editor with powerful developer tooling, like IntelliSense code completion and debugging. You can use Visual Studio Code on your local development machine to write, run, and debug code in Databricks, interact with Databricks SQL warehouses in remote Databricks workspaces, and more, as follows: \n| Name | Use this when you want to\u2026 |\n| --- | --- |\n| [Databricks extension for Visual Studio Code](https://docs.databricks.com/dev-tools/vscode-ext/index.html) | Use Visual Studio Code to write and run local Python, R, Scala, and SQL code on a remote Databricks workspace. |\n| [Databricks Connect in Visual Studio Code with Python](https://docs.databricks.com/dev-tools/databricks-connect/python/vscode.html) | Use Visual Studio Code to write, run, and debug local Python code on a remote Databricks workspace. |\n| [Databricks Connect in Visual Studio Code with Scala](https://docs.databricks.com/dev-tools/databricks-connect/scala/vscode.html) | Use Visual Studio Code to write, run, and debug local Scala code on a remote Databricks workspace. |\n| [Databricks Asset Bundles](https://docs.databricks.com/dev-tools/bundles/index.html) | Use Visual Studio Code to make authoring, deploying, and running bundles easier. *Databricks Asset Bundles* (or *bundles* for short) enable you to programmatically define, deploy, and run Databricks jobs, Delta Live Tables pipelines, and MLOps Stacks by using CI/CD best practices and workflows. |\n| [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) | Use the built-in Terminal in Visual Studio Code to work with Databricks from the command line. |\n| [Databricks SDKs](https://docs.databricks.com/dev-tools/index-sdk.html) | Use the built-in programming language support in Visual Studio Code to write, run, and debug Python, Java, and Go code that works with Databricks. |\n| [Databricks Driver for SQLTools](https://docs.databricks.com/dev-tools/sqltools-driver.html) | Use a graphical user interface in Visual Studio Code to query Databricks SQL warehouses in remote Databricks workspaces. |\n| [Databricks SQL connectors, drivers, and APIs](https://docs.databricks.com/dev-tools/index-driver.html) | Use the built-in programming language support in Visual Studio Code to write, run, and debug Python, Go, JavaScript, TypeScript, and Node.js code that works with Databricks SQL warehouses in remote Databricks workspaces. |\n| [Provision infrastructure](https://docs.databricks.com/dev-tools/index-iac.html) | Use third-party plugins such as the Hashicorp Terraform Extension for Visual Studio Code to make it easier to provision Databricks infrastructure with Terraform and follow infrastructure-as-code (IaC) best practices. Use the built-in programming language support in Visual Studio Code to write and deploy Python, TypeScript, Java, C#, and Go definitions of Databricks infrastructure through third-party offerings such as the Cloud Development Kit for Terraform (CDKTF) and Pulumi. |\n\n", "chunk_id": "764e6b2b9d33d8dd1d29f70e17acf619", "url": "https://docs.databricks.com/dev-tools/visual-studio-code.html"} +{"chunked_text": "# Introduction to Databricks Lakehouse Monitoring\n", "chunk_id": "46a5182ce48b20d7617a750390b29baa", "url": "https://docs.databricks.com/lakehouse-monitoring/monitor-alerts.html"} +{"chunked_text": "# Introduction to Databricks Lakehouse Monitoring\n### Monitor alerts\n\nPreview \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \nThis page describes how to create a Databricks SQL alert based on a metric from a monitor metrics table. Some common uses for monitor alerts include: \n* Get notified when a statistic moves out of a certain range. For example, you want to receive a notification when the fraction of missing values exceeds a certain level.\n* Get notified of a change in the data. The drift metrics table stores statistics that track changes in the data distribution.\n* Get notified if data has drifted in comparison to the baseline table. You can set up an alert to investigate the data changes or, for `InferenceLog` analysis, to indicate that the model should be retrained. \nMonitor alerts are created and used the same way as other Databricks SQL alerts. You create a [Databricks SQL query](https://docs.databricks.com/sql/user/queries/index.html) on the monitor profile metrics table or drift metrics table. You then create a Databricks SQL alert for this query. You can configure the alert to evaluate the query at a desired frequency, and send a notification if the alert is triggered. By default, email notification is sent. You can also set up a webhook or send notifications to other applications such as Slack or Pagerduty. \nYou can also quickly create an alert from the [monitor dashboard](https://docs.databricks.com/lakehouse-monitoring/monitor-dashboard.html) as follows: \n1. On the dashboard, find the chart for which you want to create an alert.\n2. Click ![Kebab menu](https://docs.databricks.com/_images/kebab-menu.png) in the upper-right corner of the chart and select **View query**. The SQL editor opens.\n3. In the SQL editor, click ![Kebab menu](https://docs.databricks.com/_images/kebab-menu.png) above the editor window and select **Create alert**. The **New alert** dialog opens in a new tab.\n4. Configure the alert and click **Create alert**. \nNote that if the query uses parameters, then the alert is based on the default values for these parameters. You should confirm that the default values reflect the intent of the alert. \nFor details, see [Databricks SQL alerts](https://docs.databricks.com/sql/user/alerts/index.html).\n\n", "chunk_id": "ad46dd649af7864f934123978c11d492", "url": "https://docs.databricks.com/lakehouse-monitoring/monitor-alerts.html"} +{"chunked_text": "# Databricks administration introduction\n## Monitor usage with system tables\n#### Warehouse events system table reference\n\nPreview \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \nIn this article, you learn how to use the warehouse events system table to monitor and manage the SQL warehouses in your workspaces. This table records a row for every time a warehouse starts, stops, runs, and scales up and down. You can use the sample queries in this article with alerts to keep you informed of changes to your warehouses. \nThe warehouse events system table is located at `system.compute.warehouse_events`.\n\n#### Warehouse events system table reference\n##### Logged warehouse event types\n\nThis system table logs the following types of events: \n* `SCALED_UP`: A new cluster was added to the warehouse.\n* `SCALED_DOWN`: A cluster was removed from the warehouse.\n* `STOPPING`: The warehouse is in the process of stopping.\n* `RUNNING`: The warehouse is actively running.\n* `STARTING`: The warehouse is in the process of starting up.\n* `STOPPED`: The warehouse has completely stopped running.\n\n", "chunk_id": "8f7d0222ccc9fb52f7513dceb6386e0b", "url": "https://docs.databricks.com/admin/system-tables/warehouse-events.html"} +{"chunked_text": "# Databricks administration introduction\n## Monitor usage with system tables\n#### Warehouse events system table reference\n##### Warehouse events schema\n\nThe `warehouse_events` system table uses the following schema: \n| Column name | Data type | Description | Example |\n| --- | --- | --- | --- |\n| `account_id` | string | The ID of the Databricks account. | `7af234db-66d7-4db3-bbf0-956098224879` |\n| `workspace_id` | string | The ID of the workspace where the warehouse is deployed. | `123456789012345` |\n| `warehouse_id` | string | The ID of SQL warehouse the event is related to. | `123456789012345` |\n| `event_type` | string | The type of warehouse event. Possible values are `SCALED_UP`, `SCALED_DOWN`, `STOPPING`, `RUNNING`, `STARTING`, and `STOPPED`. | `SCALED_UP` |\n| `cluster_count` | integer | The number of clusters that are actively running. | `2` |\n| `event_time` | timestamp | Timestamp of when the event took place. | `2023-07-20T19:13:09.504Z` |\n\n", "chunk_id": "2922ad5949f4586f2934b19600e8f5c6", "url": "https://docs.databricks.com/admin/system-tables/warehouse-events.html"} +{"chunked_text": "# Databricks administration introduction\n## Monitor usage with system tables\n#### Warehouse events system table reference\n##### Sample queries\n\nThe following sample queries are templates. Plug in whatever values make sense for your organization. You can also add alerts to these queries to help you stay informed about changes to your warehouses. See [Create an alert](https://docs.databricks.com/sql/user/alerts/index.html#create-alert). \nUse the following sample queries to gain insight into warehouse behavior: \n* [Which warehouses are actively running and for how long?](https://docs.databricks.com/admin/system-tables/warehouse-events.html#active)\n* [Identify warehouses that are upscaled longer than expected](https://docs.databricks.com/admin/system-tables/warehouse-events.html#upscaled)\n* [Warehouses that start for the first time](https://docs.databricks.com/admin/system-tables/warehouse-events.html#new)\n* [Investigate billing charges](https://docs.databricks.com/admin/system-tables/warehouse-events.html#billing)\n* [Which warehouses haven\u2019t been used in the last 30 days?](https://docs.databricks.com/admin/system-tables/warehouse-events.html#not-used)\n* [Warehouses with the most uptime in a month](https://docs.databricks.com/admin/system-tables/warehouse-events.html#month)\n* [Warehouses that spent the most time upscaled during a month](https://docs.databricks.com/admin/system-tables/warehouse-events.html#upscale-month) \n### Which warehouses are actively running and for how long? \nThis query identifies which warehouses are currently active along with their running time in hours. \n```\nUSE CATALOG `system`;\n\nSELECT\nwe.warehouse_id,\nwe.event_time,\nTIMESTAMPDIFF(MINUTE, we.event_time, CURRENT_TIMESTAMP()) / 60.0 AS running_hours,\nwe.cluster_count\nFROM\ncompute.warehouse_events we\nWHERE\nwe.event_type = 'RUNNING'\nAND NOT EXISTS (\nSELECT 1\nFROM compute.warehouse_events we2\nWHERE we2.warehouse_id = we.warehouse_id\nAND we2.event_time > we.event_time\n)\n\n``` \n**Alert opportunity**: As a workspace admin you might want to be alerted if a warehouse is running longer than expected. For example, you can use the query results to set an alert condition to trigger when the running hours exceed a certain threshold. \n### Identify warehouses that are upscaled longer than expected \nThis query identifies which warehouses are currently active along with their running time in hours. \n```\nuse catalog `system`;\n\nSELECT\nwe.warehouse_id,\nwe.event_time,\nTIMESTAMPDIFF(MINUTE, we.event_time, CURRENT_TIMESTAMP()) / 60.0 AS upscaled_hours,\nwe.cluster_count\nFROM\ncompute.warehouse_events we\nWHERE\nwe.event_type = 'SCALED_UP'\nAND we.cluster_count >= 2\nAND NOT EXISTS (\nSELECT 1\nFROM compute.warehouse_events we2\nWHERE we2.warehouse_id = we.warehouse_id\nAND (\n(we2.event_type = 'SCALED_DOWN') OR\n(we2.event_type = 'SCALED_UP' AND we2.cluster_count < 2)\n)\nAND we2.event_time > we.event_time\n)\n\n``` \n**Alert opportunity**: Alerting on this condition can help you monitor resources and cost. You could set an alert for when the upscaled hours exceed a certain limit. \n### Warehouses that start for the first time \nThis query informs you about new warehouses that are starting for the first time. \n```\nuse catalog `system`;\n\nSELECT\nwe.warehouse_id,\nwe.event_time,\nwe.cluster_count\nFROM\ncompute.warehouse_events we\nWHERE\n(we.event_type = 'STARTING' OR we.event_type = 'RUNNING')\nAND NOT EXISTS (\nSELECT 1\nFROM compute.warehouse_events we2\nWHERE we2.warehouse_id = we.warehouse_id\nAND we2.event_time < we.event_time\n)\n\n``` \n**Alert opportunity**: Alerting on new warehouses can help your organization track resource allocation. For example, you could set an alert that\u2019s triggered every time a new warehouse starts. \n### Investigate billing charges \nIf you want to understand specifically what a warehouse was doing to generate billing charges, this query can tell you the exact dates and times the warehouse scaled up or down, or started and stopped. \n```\nuse catalog `system`;\n\nSELECT\nwe.warehouse_id AS warehouse_id,\nwe.event_type AS event,\nwe.event_time AS event_time,\nwe.cluster_count AS cluster_count\nFROM\ncompute.warehouse_events AS we\nWHERE\nwe.event_type IN (\n'STARTING', 'RUNNING', 'STOPPING', 'STOPPED',\n'SCALING_UP', 'SCALED_UP', 'SCALING_DOWN', 'SCALED_DOWN'\n)\nAND MONTH(we.event_time) = 7\nAND YEAR(we.event_time) = YEAR(CURRENT_DATE())\nAND we.warehouse_id = '19c9d68652189278'\nORDER BY\nevent_time DESC\n\n``` \n### Which warehouses haven\u2019t been used in the last 30 days? \nThis query helps you identify unused resources, providing an opportunity for cost optimization. \n```\nuse catalog `system`;\n\nSELECT\nwe.warehouse_id,\nwe.event_time,\nwe.event_type,\nwe.cluster_count\nFROM\ncompute.warehouse_events AS we\nWHERE\nwe.warehouse_id IN (\nSELECT DISTINCT\nwarehouse_id\nFROM\ncompute.warehouse_events\nWHERE\nMONTH(event_time) = 6\nAND YEAR(event_time) = YEAR(CURRENT_DATE())\n)\nAND we.warehouse_id NOT IN (\nSELECT DISTINCT\nwarehouse_id\nFROM\ncompute.warehouse_events\nWHERE\nMONTH(event_time) = 7\nAND YEAR(event_time) = YEAR(CURRENT_DATE())\n)\nORDER BY\nevent_time DESC\n\n``` \n**Alert opportunity**: Receiving an alert on unused resources could help your organization optimize costs. For example, you could set an alert that\u2019s triggered when the query detects an unused warehouse. \n### Warehouses with the most uptime in a month \nThis query shows which warehouses have been used the most during a specific month. This query uses July as an example. \n```\nuse catalog `system`;\n\nSELECT\nwarehouse_id,\nSUM(TIMESTAMPDIFF(MINUTE, start_time, end_time)) / 60.0 AS uptime_hours\nFROM (\nSELECT\nstarting.warehouse_id,\nstarting.event_time AS start_time,\n(\nSELECT\nMIN(stopping.event_time)\nFROM\ncompute.warehouse_events AS stopping\nWHERE\nstopping.warehouse_id = starting.warehouse_id\nAND stopping.event_type = 'STOPPED'\nAND stopping.event_time > starting.event_time\n) AS end_time\nFROM\ncompute.warehouse_events AS starting\nWHERE\nstarting.event_type = 'STARTING'\nAND MONTH(starting.event_time) = 7\nAND YEAR(starting.event_time) = YEAR(CURRENT_DATE())\n) AS warehouse_uptime\nWHERE\nend_time IS NOT NULL\nGROUP BY\nwarehouse_id\nORDER BY\nuptime_hours DESC\n\n``` \n**Alert opportunity**: You might want to keep track of high-utilization warehouses. For example, you could set an alert that\u2019s triggered when the uptime hours for a warehouse exceed a specific threshold. \n### Warehouses that spent the most time upscaled during a month \nThis query informs you about warehouses that have spent significant time in the upscaled state during a month. This query uses July as an example. \n```\nuse catalog `system`;\n\nSELECT\nwarehouse_id,\nSUM(TIMESTAMPDIFF(MINUTE, upscaled_time, downscaled_time)) / 60.0 AS upscaled_hours\nFROM (\nSELECT\nupscaled.warehouse_id,\nupscaled.event_time AS upscaled_time,\n(\nSELECT\nMIN(downscaled.event_time)\nFROM\ncompute.warehouse_events AS downscaled\nWHERE\ndownscaled.warehouse_id = upscaled.warehouse_id\nAND (downscaled.event_type = 'SCALED_DOWN' OR downscaled.event_type = 'STOPPED')\nAND downscaled.event_time > upscaled.event_time\n) AS downscaled_time\nFROM\ncompute.warehouse_events AS upscaled\nWHERE\nupscaled.event_type = 'SCALED_UP'\nAND upscaled.cluster_count >= 2\nAND MONTH(upscaled.event_time) = 7\nAND YEAR(upscaled.event_time) = YEAR(CURRENT_DATE())\n) AS warehouse_upscaled\nWHERE\ndownscaled_time IS NOT NULL\nGROUP BY\nwarehouse_id\nORDER BY\nupscaled_hours DESC limit 0;\n\n``` \n**Alert opportunity**: You might want to keep track of high-utilization warehouses. For example, you could set an alert that\u2019s triggered when the uptime hours for a warehouse exceed a specific threshold.\n\n", "chunk_id": "70a6b7b4bc1c2f755700e4cd4206dcae", "url": "https://docs.databricks.com/admin/system-tables/warehouse-events.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage your workspace\n#### Enforce AWS Instance Metadata Service v2 on a workspace\n\nImportant \nBecause serverless compute resources automatically enforce IMDSv2, this setting does not apply to serverless compute resources. \nInstance metadata service (IMDS) is a service that runs locally on compute instances in AWS and is used to retrieve instance metadata. Crucially for security, instance metadata also includes credentials for the role associated with the instance. See [Instance metadata and user data](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html). \nIn response to security concerns around IMDS, AWS created IMDSv2 (version 2) which reduces risk from a common attack pattern and replaces the request-and-response flow with a session-oriented flow. For details of the improvements, see [this AWS blog article](https://aws.amazon.com/blogs/security/defense-in-depth-open-firewalls-reverse-proxies-ssrf-vulnerabilities-ec2-instance-metadata-service/). \nAs a workspace admin, you can enforce the use of IMDSv2 on clusters by enabling **Enforce AWS instance metadata v2s** in the **Compute** tab of the admin settings page. Databricks recommends that you configure your workspace to enforce IMDSv2. If your workspace was created after October 1, 2022, your workspace has this admin setting enabled by default.\n\n", "chunk_id": "45c3875f19e353855f4e68a8fba098e6", "url": "https://docs.databricks.com/admin/cloud-configurations/aws/imdsv2.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage your workspace\n#### Enforce AWS Instance Metadata Service v2 on a workspace\n##### Requirements\n\n* IMDSv2 enforcement does not support use of an isolated [AWS Glue catalog](https://aws.amazon.com/glue). To disable isolation, see [How to migrate and enforce IMDSv2 for all clusters](https://docs.databricks.com/admin/cloud-configurations/aws/imdsv2.html#migrate).\n* IMDSv2 enforcement requires use of a supported Databricks Runtime version as listed on [Databricks Runtime release notes versions and compatibility](https://docs.databricks.com/release-notes/runtime/index.html), however the Light 2.4 Extended Support version is unsupported.\n\n", "chunk_id": "061ffd0c53d3db5cfe9354f946d86d75", "url": "https://docs.databricks.com/admin/cloud-configurations/aws/imdsv2.html"} +{"chunked_text": "# Databricks administration introduction\n## Manage your workspace\n#### Enforce AWS Instance Metadata Service v2 on a workspace\n##### How to migrate and enforce IMDSv2 for all clusters\n\nWarning \nEnforcing IMDSv2 causes any existing workloads to fail if they use IMDSv1 to fetch instance metadata. \nTo enforce IMDSv2 on new, non-serverless clusters: \n1. IMDSv2 enforcement does not support use of an isolated [AWS Glue catalog](https://aws.amazon.com/glue). To use Glue catalog, add one Spark conf line to your clusters to disable the isolation mode: \n```\nspark.databricks.hive.metastore.glueCatalog.isolation.enabled false\n\n```\n2. Upgrade your code to use IMDSv2. \n1. Upgrade any existing AWS CLIs and SDKs that your workloads use. Note that Databricks has already upgraded the SDK that is installed by default in the Databricks Runtime. Databricks recommends that you follow AWS\u2019s [upgrade guide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html) to ensure a safe transition. \nModify all notebooks in the workspace to remove any existing IMDSv1 usage and replace with IMDSv2 usage. \nFor example, the following is IMDSv1 API client code: \n```\ncurl http://169.254.169.254/latest/meta-data/\n\n``` \nFor that example, change it to IMDSv2 API client code: \n```\nTOKEN=`curl -X PUT \"http://169.254.169.254/latest/api/token\" \\\n-H \"X-aws-ec2-metadata-token-ttl-seconds: 21600\"` && \\\n\\\ncurl -H \"X-aws-ec2-metadata-token: $TOKEN\" \\\n-v http://169.254.169.254/latest/meta-data/\n\n``` \nFor more guidance and examples, see the the AWS article [Retrieve instance metadata](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html).\n2. Test your modified code to ensure it works correctly with IMDSv2.\n3. Enable enforcement of IMDSv2 for the workspace. \n1. As a workspace admin, Go to the [settings page](https://docs.databricks.com/admin/index.html#admin-settings).\n2. Click the **Compute** tab.\n3. Click **Enforce AWS instance metadata v2**.\n4. Refresh the page to ensure that the setting took effect.\n4. Restart any running clusters to ensure that all EC2 instances have IMDSv2 enforced. If clusters are attached to a fleet instance pool, create a new fleet instance pool and recreate the clusters using the new fleet instance pool.\n5. Monitor the CloudWatch metric `MetadataNoToken` to ensure that your workspace is not making any active IMDSv1 calls.\n\n", "chunk_id": "db03f15d1d896f3242ebc299ad33c01b", "url": "https://docs.databricks.com/admin/cloud-configurations/aws/imdsv2.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `json_tuple` table-valued generator function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns multiple JSON objects as a tuple.\n\n####### `json_tuple` table-valued generator function\n######## Syntax\n\n```\njson_tuple(jsonStr, path1 [, ...] )\n\n```\n\n####### `json_tuple` table-valued generator function\n######## Arguments\n\n* `jsonStr`: A STRING expression with well-formed JSON.\n* `pathN`: A STRING literal with a JSON path.\n\n", "chunk_id": "65a09eb6576def45a5209df84663b501", "url": "https://docs.databricks.com/sql/language-manual/functions/json_tuple.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `json_tuple` table-valued generator function\n######## Returns\n\nA single row composed of the JSON objects. \nIf any object cannot be found, `NULL` is returned for that object. \n* **Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 12.1 and earlier: \n`json_tuple` can only be placed in the `SELECT` list as the root of an expression or following a [LATERAL VIEW](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-lateral-view.html).\nWhen placing the function in the `SELECT` list there must be no other generator function in the same `SELECT` list or [UNSUPPORTED\\_GENERATOR.MULTI\\_GENERATOR](https://docs.databricks.com/error-messages/unsupported-generator-error-class.html#multi_generator) is raised.\n* **Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 12.2 LTS and above: \nInvocation from the [LATERAL VIEW clause](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-lateral-view.html) or the `SELECT` list is deprecated.\nInstead, invoke `json_tuple` as a [table\\_reference](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-table-reference.html).\n\n", "chunk_id": "cebc206a0de695742ecf9a822d946374", "url": "https://docs.databricks.com/sql/language-manual/functions/json_tuple.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `json_tuple` table-valued generator function\n######## Examples\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 12.1 and earlier: \n```\n> SELECT json_tuple('{\"a\":1, \"b\":2}', 'a', 'b'), 'Spark SQL';\n1 2 Spark SQL\n\n> SELECT json_tuple('{\"a\":1, \"b\":2}', 'a', 'c'), 'Spark SQL';\n1 NULL Spark SQL\n\n> SELECT json_tuple('{\"a\":1, \"b\":2}', 'a', 'c'), json_tuple('{\"c\":1, \"d\":2}', 'c', 'd'), 'Spark SQL';\nError: UNSUPPORTED_GENERATOR.MULTI_GENERATOR\n\n``` \n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 12.2 LTS and above: \n```\n> SELECT j.*, 'Spark SQL' FROM json_tuple('{\"a\":1, \"b\":2}', 'a', 'b') AS j;\n1 2 Spark SQL\n\n> SELECT j.*, 'Spark SQL' FROM json_tuple('{\"a\":1, \"b\":2}', 'a', 'c') AS j;\n1 NULL Spark SQL\n\n> SELECT j1.*, j2.*, 'Spark SQL'\nFROM json_tuple('{\"a\":1, \"b\":2}', 'a', 'c') AS j1,\njson_tuple('{\"c\":1, \"d\":2}', 'c', 'd') AS j2;\n\n```\n\n", "chunk_id": "faead64ab8239fbca9dc2487d205051a", "url": "https://docs.databricks.com/sql/language-manual/functions/json_tuple.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `json_tuple` table-valued generator function\n######## Related functions\n\n* [: operator](https://docs.databricks.com/sql/language-manual/functions/colonsign.html)\n* [json\\_object\\_keys function](https://docs.databricks.com/sql/language-manual/functions/json_object_keys.html)\n* [json\\_array\\_length function](https://docs.databricks.com/sql/language-manual/functions/json_array_length.html)\n* [json\\_tuple table-valued generator function](https://docs.databricks.com/sql/language-manual/functions/json_tuple.html)\n* [from\\_json function](https://docs.databricks.com/sql/language-manual/functions/from_json.html)\n* [get\\_json\\_object function](https://docs.databricks.com/sql/language-manual/functions/get_json_object.html)\n* [schema\\_of\\_json function](https://docs.databricks.com/sql/language-manual/functions/schema_of_json.html)\n* [to\\_json function](https://docs.databricks.com/sql/language-manual/functions/to_json.html)\n\n", "chunk_id": "d81f7df5c3e85bf494e94e94944b9ad8", "url": "https://docs.databricks.com/sql/language-manual/functions/json_tuple.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n\nThese features and Databricks platform improvements were released in July 2023. \nNote \nReleases are staged. Your Databricks workspace might not be updated until a week or more after the initial release date.\n\n#### July 2023\n##### Email addresses in Databricks are now case insensitive\n\n**July 31, 2023** \nUsernames (email addresses) in Databricks are now case insensitive. Previously, email addresses were case sensitive. Users can now login in a case insensitive manner.\n\n#### July 2023\n##### Workspace admins can now create account groups\n\n**July 31, 2023** \nWorkspace admins can now create account groups from their [identity federated workspaces](https://docs.databricks.com/admin/users-groups/index.html#assign-users-to-workspaces). Previously workspace admins could create only [workspace-local groups](https://docs.databricks.com/admin/users-groups/workspace-local-groups.html). Account groups can be granted access to data in a [Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/index.html) metastore and access to identity-federated workspaces. Databricks recommends that you use account groups instead of workspace-local groups to take advantage of Unity Catalog and a central place to administer identity. \nSee [Manage account groups using the workspace admin settings page](https://docs.databricks.com/admin/users-groups/groups.html#workspace-settings).\n\n", "chunk_id": "8aff553702f041e5c527826f067a4c01", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Group manager role is in Public Preview\n\n**July 31, 2023** \nYou can now grant Databricks users, service principals, and groups permissions to manage groups. Group managers can manage group membership. They can also assign other users the group manager role. Account admins have the group manager role on all groups in the account. Workspace admins have the group manager role on account groups that they create. \nSee [Who can manage account groups?](https://docs.databricks.com/admin/users-groups/groups.html#who-can-manage-groups).\n\n#### July 2023\n##### Databricks CLI updated to version 0.202.0 (Public Preview)\n\n**July 27, 2023** \nThe [Databricks command-line interface (Databricks CLI)](https://docs.databricks.com/dev-tools/cli/index.html) has been updated to version 0.202.0. For details, see the changelog for version [0.202.0](https://github.com/databricks/cli/releases/tag/v0.202.0).\n\n#### July 2023\n##### Databricks SDK for Python updated to version 0.3.0 (Beta)\n\n**July 27, 2023** \n[Databricks SDK for Python](https://docs.databricks.com/dev-tools/sdk-python.html) version 0.3.0 introduces support for the Account Network Policy service; handles nested query parameters; adds, removes, and creates various methods, fields, dataclasses, and services; and more. For details, see the changelog for version [0.3.0](https://github.com/databricks/databricks-sdk-py/releases/tag/v0.3.0).\n\n", "chunk_id": "504bfe337beb818ebf721e2d70833c57", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Databricks SDK for Go updated to version 0.14.1 (Beta)\n\n**July 27, 2023** \n[Databricks SDK for Go](https://docs.databricks.com/dev-tools/sdk-go.html) version 0.14.1 handles nested query parameters appropriately, fixing filtering in QueryHistory listings. For details, see the changelog for version [0.14.1](https://github.com/databricks/databricks-sdk-go/releases/tag/v0.14.1).\n\n#### July 2023\n##### Databricks SDK for Go updated to version 0.14.0 (Beta)\n\n**July 26, 2023** \n[Databricks SDK for Go](https://docs.databricks.com/dev-tools/sdk-go.html) version 0.14.0 adds support for the Account Network Policy service, and adds, removes, and changes several methods, fields, and types. For details, see the changelog for version [0.14.0](https://github.com/databricks/databricks-sdk-go/releases/tag/v0.14.0).\n\n", "chunk_id": "6d94b960a815c8dc2538898429149a33", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Run another job as a task in a Databricks job\n\n**July 25, 2023** \nYou can use the new `Run Job` task to add another job as a task in your Databricks job, allowing you to decompose a large processing workflow into multiple component jobs, or create reusable components to use in multiple jobs. For information about using the task in the Jobs UI, see [Create and run Databricks Jobs](https://docs.databricks.com/workflows/jobs/create-run-jobs.html). For information about using the task with the Jobs REST API, see [Jobs](https://docs.databricks.com/api/workspace/jobs) in the REST API 2.1 reference or the [Jobs API 2.0 reference](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html).\n\n#### July 2023\n##### All users can access data products in Databricks Marketplace by default\n\n**July 24, 2023** \nStarting on July 24 and rolling out gradually to all eligible workspaces, all users will be able to install free or previously purchased datasets using Databricks Marketplace. This is made possible by granting the `USE MARKETPLACE ASSETS` privilege to all users in all Unity Catalog metastores. This new privilege has no cost impact, since all Marketplace transactions take place outside of Databricks. If you\u2019d like to disable access, see [Disable Marketplace access](https://docs.databricks.com/marketplace/get-started-consumer.html#revoke-use-assets).\n\n", "chunk_id": "ea3c664842d0aaa808822c654722d66d", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Classic keyboard shortcuts mode\n\n**July 21, 2023** \nA new setting allows you to use features from the new editor (such as autocomplete as you type, syntax highlighting, code formatting, and more) while retaining the same familiar keyboard shortcuts from the previous editor. To access this setting, [open the editor configuration](https://docs.databricks.com/notebooks/notebooks-manage.html#configure-editor-settings) and toggle **Notebook editor shortcuts mode** to **Classic (Code Mirror)**.\n\n#### July 2023\n##### Lakehouse Federation lets you run queries against external database providers (Public Preview)\n\n**July 21, 2023** \nLakehouse Federation is the query federation platform for Databricks, taking advantage of Unity Catalog to enable users and systems to run queries against multiple external data sources without needing to migrate all data to a unified system. \nLakehouse Federation is intended for use cases like: \n* Ad hoc reporting.\n* Proof-of-concept work.\n* The exploratory phase of new ETL pipelines or reports.\n* Supporting workloads during incremental migration. \nIn each of these scenarios, query federation gets you to insights faster, because you can query the data in place and avoid complex and time-consuming ETL processing. Plus you get the advantages of Unity Catalog interfaces and data governance, including fine-grained access control, data lineage, and search. \nSee [What is Lakehouse Federation](https://docs.databricks.com/query-federation/index.html).\n\n#### July 2023\n##### Move to trash enabled for Repos\n\n**July 19, 2023** \nDeleting repos now works the same way as deleting other workspace assets. In the Repos list, right-click the repo name and select **Move to Trash**. After 30 days, the Trash folder is automatically deleted permanently. \nSee [Delete an object](https://docs.databricks.com/workspace/workspace-objects.html#delete-object).\n\n", "chunk_id": "7ce6ad9cbf65ccf29165fcb5f1da22a9", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Create alerts for slow-running or stuck jobs\n\n**July 18, 2023** \nYou can now configure an expected duration for a Databricks job or job task and add notifications if a job or task exceeds the expected duration. This feature lets you get alerts for slow-running jobs without the run being canceled. To configure an expected duration in the Databricks Workflows UI, see [Configure an expected completion time or a timeout for a job](https://docs.databricks.com/workflows/jobs/settings.html#timeout-setting-job). To configure the expected duration with the Jobs 2.1 API, see [Jobs](https://docs.databricks.com/api/workspace/jobs) in the REST API reference.\n\n#### July 2023\n##### Databricks SDK for Go updated to version 0.13.0 (Beta)\n\n**July 18, 2023** \n[Databricks SDK for Go](https://docs.databricks.com/dev-tools/sdk-go.html) version 0.13.0 adds, changes, and removes several methods and fields, and more. For details, see the changelog for version [0.13.0](https://github.com/databricks/databricks-sdk-go/releases/tag/v0.13.0).\n\n#### July 2023\n##### Databricks SDK for Python updated to version 0.2.0 (Beta)\n\n**July 18, 2023** \nThe [Databricks SDK for Python](https://docs.databricks.com/dev-tools/sdk-python.html) version 0.2.0 adds a local implementation of `dbutils.widgets`, adds, changes, and removes several methods, fields, and dataclasses, and more. For details, see the changelog for version [0.2.0](https://github.com/databricks/databricks-sdk-py/releases/tag/v0.2.0).\n\n", "chunk_id": "a47f4482c84be0436c37a1cf9111c2f3", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Databricks CLI updated to version 0.201.0 (Public Preview)\n\n**July 18, 2023** \nThe [Databricks command-line interface (Databricks CLI)](https://docs.databricks.com/dev-tools/cli/index.html) version 0.201.0 improves the `auth login` experience, supports tab completion for referencing Databricks authentication configuration profiles, makes additions, changes, and removals of several command groups and commands, and more. For details, see the changelog for version [0.201.0](https://github.com/databricks/cli/releases/tag/v0.201.0).\n\n#### July 2023\n##### Databricks SDK for Python updated to version 0.2.1 (Beta)\n\n**July 18, 2023** \n[Databricks SDK for Python](https://docs.databricks.com/dev-tools/sdk-python.html) version 0.2.1 supports older versions of `urllib`. For details, see the changelog for version [0.2.1](https://github.com/databricks/databricks-sdk-py/releases/tag/v0.2.1).\n\n#### July 2023\n##### Databricks Assistant is in Public Preview\n\n**July 18, 2023** \nThe Databricks Assistant works as an AI-based companion pair-programmer to make you more efficient as you create notebooks, queries, and files. It can help you rapidly answer questions by generating, optimizing, completing, explaining, and fixing code and queries. See the [Databricks Assistant FAQ](https://docs.databricks.com/notebooks/databricks-assistant-faq.html) for more information and for instructions on how to enable the Assistant.\n\n", "chunk_id": "df5c31369a7b38e594f748b659081eba", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Deactivate users and service principals from your account\n\n**July 13, 2023** \nYou can now deactivate users and service principals from your Databricks account. A deactivated user cannot log in to the Databricks account or identity federated workspaces. However, all of the user\u2019s permissions and workspace objects remain unchanged. For more information, see [Deactivate a user in your Databricks account](https://docs.databricks.com/admin/users-groups/users.html#deactivate-user) and [Deactivate a service principal in your Databricks account](https://docs.databricks.com/admin/users-groups/service-principals.html#deactivate-sp).\n\n#### July 2023\n##### Account-level SCIM provisioning now deactivates users when they are deactivated in the identity provider\n\n**July 13, 2023** \nAccount-level SCIM provisioning now deactivates users when they are deactivated in your identity provider. Previously, when a user was deactivated in an identity provider, account-level SCIM provisioning deleted them from the Databricks account. For more information, see [Sync users and groups from your identity provider](https://docs.databricks.com/admin/users-groups/scim/index.html) and [Deactivate a user in your Databricks account](https://docs.databricks.com/admin/users-groups/users.html#deactivate-user).\n\n#### July 2023\n##### Trash directory admin access\n\n**July 13, 2023** \nWorkspace admins can now access other users\u2019 Trash directories.\n\n", "chunk_id": "617e148d3776867dd974de20d4764610", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Prevention of MIME type sniffing and XSS attack page rendering are now always enabled\n\n**July 12, 2023** \nDatabricks reduces the risk of MIME type sniffing and XSS attack page rendering by adding appropriate HTTP headers. These features were enabled by default and previously could be disabled. For improved security, Databricks now always enables both features and there are no longer workspace admin settings to disable them. For workspaces that previously disabled one or both of these features, both features are now enabled.\n\n#### July 2023\n##### Unity Catalog volumes are in Public Preview\n\n**July 12, 2023** \nYou can now use volumes in Unity Catalog to manage access to cloud object storage locations on Databricks Runtime 13.2 and above. Volumes provide capabilities for accessing, storing, governing, and organizing files. See [Create and work with volumes](https://docs.databricks.com/connect/unity-catalog/volumes.html).\n\n#### July 2023\n##### Simplified experience for submitting product feedback from the workspace\n\n**July 11, 2023** \nIt\u2019s now easier to send product feedback from your workspace. You can submit feedback and attach screenshots with fewer steps and without leaving the workspace UI. For more information, see [Submit feedback from the workspace](https://docs.databricks.com/resources/ideas.html#in-product-feedback).\n\n#### July 2023\n##### Databricks extension for Visual Studio Code updated to version 1.1.0\n\n**July 10, 2023** \nThe [Databricks extension for Visual Studio Code](https://docs.databricks.com/dev-tools/vscode-ext/index.html) version 1.1.0 enables Databricks Connect integration by default, adds experimental features for Jupyter notebooks, and more. For details, see the changelog for version [1.1.0](https://github.com/databricks/databricks-vscode/releases/tag/release-v1.1.0).\n\n", "chunk_id": "595234b3f2db0c82a083478a109097c3", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Functions now displayed in Catalog Explorer (Public Preview)\n\n**July 10, 2023** \nSQL and Python user-defined functions are now visible in Catalog Explorer. You can view function details and manage permissions using this new UI.\n\n#### July 2023\n##### Databricks Terraform provider updated to version 1.21.0\n\n**July 7, 2023** \n[Databricks Terraform provider](https://docs.databricks.com/dev-tools/terraform/index.html) version 1.21.0 adds support for subscriptions in dashboards and alert SQL tasks in `databricks_job`, defines generic Databricks data utilities for defining workspace and account-level data sources, and more. For details, see the changelog for version [1.21.0](https://github.com/databricks/terraform-provider-databricks/releases/tag/v1.21.0).\n\n#### July 2023\n##### The maximum offset for the `List all jobs` and `List job runs` API requests is now limited\n\n**July 6, 2023** \nThe maximum offset you can specify in the `List all jobs` and `List job runs` requests in the Jobs API is now limited. To avoid this limit when you use these requests, use token-based pagination which does not have this limit. See [GET /api/2.1/jobs/list](https://docs.databricks.com/api/workspace/jobs/list) and [GET /api/2.1/jobs/runs/list](https://docs.databricks.com/api/workspace/jobs/listruns) in the REST API reference.\n\n", "chunk_id": "8a0ce77a082edcb13aa1da469ad11999", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### July 2023\n##### Databricks Runtime 13.2 is GA\n\n**July 6, 2023** \nDatabricks Runtime 13.2 and Databricks Runtime 13.2 ML are now generally available. \nSee [Databricks Runtime 13.2 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/13.2.html) and [Databricks Runtime 13.2 for Machine Learning (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/13.2ml.html).\n\n#### July 2023\n##### Delta Sharing and Databricks Marketplace support view sharing (Public Preview)\n\n**July 6, 2023** \nYou can now share views in Delta Sharing and Databricks Marketplace. See [Add views to a share](https://docs.databricks.com/data-sharing/create-share.html#views).\n\n#### July 2023\n##### Init scripts on DBFS reach end of life on Sept 1, 2023\n\n**July 5, 2023** \nOn Sept 1, 2023, support for init scripts on DBFS will reach end of life and the feature will no longer function. [Store init scripts in workspace files](https://docs.databricks.com/files/workspace-init-scripts.html) to ensure that they continue to function after Sept 1, 2023.\n\n", "chunk_id": "318ae8844640973cb34bc90e3eef29eb", "url": "https://docs.databricks.com/release-notes/product/2023/july.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `h3_centeraswkb` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 11.3 LTS and above \nReturns the center of the input H3 cell as a point in [WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary) format.\n\n####### `h3_centeraswkb` function\n######## Syntax\n\n```\nh3_centeraswkb ( h3CellIdExpr )\n\n```\n\n####### `h3_centeraswkb` function\n######## Arguments\n\n* `h3CellIdExpr`: A BIGINT expression, or a hexadecimal STRING expression representing an H3 cell ID.\n\n####### `h3_centeraswkb` function\n######## Returns\n\nA value of the type BINARY representing the center of the input H3 cell as a point in [WKB](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary) format. \nThe function returns NULL if the input expression is NULL.\nThe function does partial validation regarding whether the input argument is a valid H3 cell ID. A necessary, but not sufficient condition for a valid H3 ID is that its value is between `0x08001fffffffffff` and `0x08ff3b6db6db6db6`.\nThe behavior of the function is undefined if the input cell ID is not a valid cell ID.\n\n", "chunk_id": "dc4c41e0f77c06a24f8911bbd143fa08", "url": "https://docs.databricks.com/sql/language-manual/functions/h3_centeraswkb.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `h3_centeraswkb` function\n######## Error conditions\n\n* If `h3CellIdExpr` is a STRING that cannot be converted to a BIGINT or corresponds to a BIGINT value that is smaller than `0x08001fffffffffff` or larger than `0x08ff3b6db6db6db6`, the function returns [H3\\_INVALID\\_CELL\\_ID](https://docs.databricks.com/error-messages/h3-invalid-cell-id-error-class.html).\n\n####### `h3_centeraswkb` function\n######## Examples\n\n```\n-- Input a BIGINT representing a hexagonal cell.\n> SELECT hex(h3_centeraswkb(599686042433355775))\n0101000000F5ACA5F17C7E5EC0833013F542AC4240\n\n-- Input a STRING representing a pentagonal cell.\n> SELECT hex(h3_centeraswkb('8009fffffffffff'))\n01010000009D8F6AAF881225404E2B56CDCC2C5040\n\n-- Input is an invalid H3 cell ID.\n> SELECT h3_centeraswkb(0)\n[H3_INVALID_CELL_ID] 0 is not a valid H3 cell ID\n\n```\n\n", "chunk_id": "9ceb7c86fcd5e8d95416ca5a34a76b25", "url": "https://docs.databricks.com/sql/language-manual/functions/h3_centeraswkb.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `h3_centeraswkb` function\n######## Related functions\n\n* [h3\\_boundaryasgeojson function](https://docs.databricks.com/sql/language-manual/functions/h3_boundaryasgeojson.html)\n* [h3\\_boundaryaswkb function](https://docs.databricks.com/sql/language-manual/functions/h3_boundaryaswkb.html)\n* [h3\\_boundaryaswkt function](https://docs.databricks.com/sql/language-manual/functions/h3_boundaryaswkt.html)\n* [h3\\_centerasgeojson function](https://docs.databricks.com/sql/language-manual/functions/h3_centerasgeojson.html)\n* [h3\\_centeraswkt function](https://docs.databricks.com/sql/language-manual/functions/h3_centeraswkt.html)\n\n", "chunk_id": "04d20c8341f48b103639e588935077fc", "url": "https://docs.databricks.com/sql/language-manual/functions/h3_centeraswkb.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `decode` (character set) function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nTranslates binary `expr` to a string using the character set encoding `charSet`.\n\n####### `decode` (character set) function\n######## Syntax\n\n```\ndecode(expr, charSet)\n\n```\n\n####### `decode` (character set) function\n######## Arguments\n\n* `expr`: A `BINARY` expression encoded in `charset`.\n* `charSet`: A `STRING` expression.\n\n####### `decode` (character set) function\n######## Returns\n\nA `STRING`. \nIf `charSet` does not match the encoding the result is undefined. \nThe following character set encodings are supported (case-insensitive): \n* `'US-ASCII'`: Seven-bit ASCII, ISO646-US.\n* `'ISO-8859-1'`: ISO Latin Alphabet No. 1, ISO-LATIN-1.\n* `'UTF-8'`: Eight-bit UCS Transformation Format.\n* `'UTF-16BE'`: Sixteen-bit UCS Transformation Format, big-endian byte order.\n* `'UTF-16LE'`: Sixteen-bit UCS Transformation Format, little-endian byte order.\n* `'UTF-16'`: Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark.\n\n", "chunk_id": "bbe198ba61622bc30c6d39efe27fd64c", "url": "https://docs.databricks.com/sql/language-manual/functions/decode_cs.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `decode` (character set) function\n######## Examples\n\n```\n-- Wrap encode in hex to normalize UI dependent BINARY output.\n> SELECT hex(encode('Spark SQL', 'UTF-16'));\nFEFF0053007000610072006B002000530051004C\n\n> SELECT hex(encode('Spark SQL', 'US-ASCII'));\n537061726B2053514C\n\n> SELECT decode(X'FEFF0053007000610072006B002000530051004C', 'UTF-16')\nSpark SQL\n\n```\n\n####### `decode` (character set) function\n######## Related functions\n\n* [encode function](https://docs.databricks.com/sql/language-manual/functions/encode.html)\n* [decode (key) function](https://docs.databricks.com/sql/language-manual/functions/decode.html)\n\n", "chunk_id": "9b22b3b38835b9d3ed2e4241054a6c4c", "url": "https://docs.databricks.com/sql/language-manual/functions/decode_cs.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `h3_stringtoh3` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 11.3 LTS and above \nConverts the input string, which is expected to be a hexadecimal string representing an H3 cell, to the corresponding BIGINT representation of the H3 cell.\n\n####### `h3_stringtoh3` function\n######## Syntax\n\n```\nh3_stringtoh3 ( h3CellIdExpr )\n\n```\n\n####### `h3_stringtoh3` function\n######## Arguments\n\n* `h3CellIdStringExpr`: A well-formed hexadecimal STRING expression representing a valid H3 cell ID.\n\n####### `h3_stringtoh3` function\n######## Returns\n\nA value of type BIGINT. The returned value is the BIGINT representation of the input hexadecimal string. \nThe function returns NULL if the input is NULL.\nThe function converts the hexadecimal string to the corresponding BIGINT number.\nThe function does partial validation regarding whether the input argument is a valid H3 cell ID. A necessary, but not sufficient condition for a valid H3 ID is that its value is between `0x08001fffffffffff` and `0x08ff3b6db6db6db6`.\nThe behavior of the function is undefined if the input cell ID is not a valid cell ID.\n\n", "chunk_id": "842497292fa0c45bfb92fd72b4acc076", "url": "https://docs.databricks.com/sql/language-manual/functions/h3_stringtoh3.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `h3_stringtoh3` function\n######## Error conditions\n\n* If the value of `h3CellIdStringExpr` cannot be converted to a BIGINT or if the value corresponds to a BIGINT value that is smaller than `0x08001fffffffffff` or larger than `0x08ff3b6db6db6db6`, the function returns [H3\\_INVALID\\_CELL\\_ID](https://docs.databricks.com/error-messages/h3-invalid-cell-id-error-class.html).\n\n####### `h3_stringtoh3` function\n######## Examples\n\n```\n-- Simple example\n> SELECT h3_stringtoh3('85283473fffffff')\n599686042433355775\n\n-- Input is an invalid H3 cell ID.\n> SELECT h3_stringtoh3('0')\n[H3_INVALID_CELL_ID] 0 is not a valid H3 cell ID\n\n```\n\n####### `h3_stringtoh3` function\n######## Related functions\n\n* [h3\\_h3tostring function](https://docs.databricks.com/sql/language-manual/functions/h3_h3tostring.html)\n\n", "chunk_id": "1ccfa67874961b9d05a61a0bfe7a5dcd", "url": "https://docs.databricks.com/sql/language-manual/functions/h3_stringtoh3.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### Error classes in Databricks\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 12.2 and above \nError classes are descriptive, human-readable, strings unique to the error condition. \nYou can use error classes to programmatically handle errors in your application without the need to parse the error message. \nThis is a list of common, named error conditions returned by Databricks.\n\n", "chunk_id": "dd583f5824b17e1af26c57e1aca41bb3", "url": "https://docs.databricks.com/error-messages/error-classes.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### Error classes in Databricks\n##### Databricks Runtime and Databricks SQL\n\n### AGGREGATE\\_FUNCTION\\_WITH\\_NONDETERMINISTIC\\_EXPRESSION \n[SQLSTATE: 42845](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nNon-deterministic expression `` should not appear in the arguments of an aggregate function. \n### AI\\_FUNCTION\\_HTTP\\_PARSE\\_CAST\\_ERROR \n[SQLSTATE: 2203G](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to parse model output when casting to the specified returnType: \u201c``\u201d, response JSON was: \u201c``\u201d. Please update the returnType to match the contents of the type represented by the response JSON and then retry the query again. \n### AI\\_FUNCTION\\_HTTP\\_PARSE\\_COLUMNS\\_ERROR \n[SQLSTATE: 2203G](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe actual model output has more than one column \u201c``\u201d. However, the specified return type[\u201c``\u201d] has only one column. Please update the returnType to contain the same number of columns as the model output and then retry the query again. \n### AI\\_FUNCTION\\_HTTP\\_REQUEST\\_ERROR \n[SQLSTATE: 08000](https://docs.databricks.com/error-messages/sqlstates.html#class-08-connection-exception) \nError occurred while making an HTTP request for function ``: `` \n### AI\\_FUNCTION\\_INVALID\\_HTTP\\_RESPONSE \n[SQLSTATE: 08000](https://docs.databricks.com/error-messages/sqlstates.html#class-08-connection-exception) \nInvalid HTTP response for function ``: `` \n### AI\\_FUNCTION\\_INVALID\\_MAX\\_WORDS \n[SQLSTATE: 22032](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe maximum number of words must be a non-negative integer, but got ``. \n### AI\\_FUNCTION\\_JSON\\_PARSE\\_ERROR \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nError occurred while parsing the JSON response for function ``: `` \n### AI\\_FUNCTION\\_UNSUPPORTED\\_ERROR \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nThe function `` is not supported in the current environment. It is only available in Databricks SQL Pro and Serverless. \n### AI\\_FUNCTION\\_UNSUPPORTED\\_REQUEST \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nFailed to evaluate the SQL function \u201c``\u201d because the provided argument of `` has \u201c``\u201d, but only the following types are supported: ``. Please update the function call to provide an argument of string type and retry the query again. \n### AI\\_FUNCTION\\_UNSUPPORTED\\_RETURN\\_TYPE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nAI function: \u201c``\u201d does not support the following type as return type: \u201c``\u201d. Return type must be a valid SQL type understood by Catalyst and supported by AI function. Current supported types includes: `` \n### AI\\_INVALID\\_ARGUMENT\\_VALUE\\_ERROR \n[SQLSTATE: 22032](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nProvided value \u201c``\u201d is not supported by argument \u201c``\u201d. Supported values are: `` \n### ALL\\_PARAMETERS\\_MUST\\_BE\\_NAMED \n[SQLSTATE: 07001](https://docs.databricks.com/error-messages/sqlstates.html#class-07-dynamic-sql-error) \nUsing name parameterized queries requires all parameters to be named. Parameters missing names: ``. \n### ALL\\_PARTITION\\_COLUMNS\\_NOT\\_ALLOWED \n[SQLSTATE: KD005](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nCannot use all columns for partition columns. \n### ALTER\\_TABLE\\_COLUMN\\_DESCRIPTOR\\_DUPLICATE \n[SQLSTATE: 42710](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nALTER TABLE `` column `` specifies descriptor \u201c``\u201d more than once, which is invalid. \n### AMBIGUOUS\\_ALIAS\\_IN\\_NESTED\\_CTE \n[SQLSTATE: 42KD0](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nName `` is ambiguous in nested CTE. \nPlease set `` to \u201cCORRECTED\u201d so that name defined in inner CTE takes precedence. If set it to \u201cLEGACY\u201d, outer CTE definitions will take precedence. \nSee \u2019. \n### AMBIGUOUS\\_COLUMN\\_OR\\_FIELD \n[SQLSTATE: 42702](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nColumn or field `` is ambiguous and has `` matches. \n### AMBIGUOUS\\_COLUMN\\_REFERENCE \n[SQLSTATE: 42702](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nColumn `` is ambiguous. It\u2019s because you joined several DataFrame together, and some of these DataFrames are the same. \nThis column points to one of the DataFrame but Spark is unable to figure out which one. \nPlease alias the DataFrames with different names via `DataFrame.alias` before joining them, \nand specify the column using qualified name, e.g. `df.alias(\"a\").join(df.alias(\"b\"), col(\"a.id\") > col(\"b.id\"))`. \n### AMBIGUOUS\\_CONSTRAINT \n[SQLSTATE: 42K0C](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nAmbiguous reference to constraint ``. \n### AMBIGUOUS\\_LATERAL\\_COLUMN\\_ALIAS \n[SQLSTATE: 42702](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nLateral column alias `` is ambiguous and has `` matches. \n### AMBIGUOUS\\_REFERENCE \n[SQLSTATE: 42704](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nReference `` is ambiguous, could be: ``. \n### AMBIGUOUS\\_REFERENCE\\_TO\\_FIELDS \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nAmbiguous reference to the field ``. It appears `` times in the schema. \n### ANSI\\_CONFIG\\_CANNOT\\_BE\\_DISABLED \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nThe ANSI SQL configuration `` cannot be disabled in this product. \n### ARGUMENT\\_NOT\\_CONSTANT \n[SQLSTATE: 42K08](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe function `` includes a parameter `` at position `` that requires a constant argument. Please compute the argument `` separately and pass the result as a constant. \n### [ARITHMETIC\\_OVERFLOW](https://docs.databricks.com/error-messages/arithmetic-overflow-error-class.html) \n[SQLSTATE: 22003](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n``.`` If necessary set `` to \u201cfalse\u201d to bypass this error. \nFor more details see [ARITHMETIC\\_OVERFLOW](https://docs.databricks.com/error-messages/arithmetic-overflow-error-class.html) \n### ASSIGNMENT\\_ARITY\\_MISMATCH \n[SQLSTATE: 42802](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe number of columns or variables assigned or aliased: `` does not match the number of source expressions: ``. \n### [AS\\_OF\\_JOIN](https://docs.databricks.com/error-messages/as-of-join-error-class.html) \n[SQLSTATE: 42604](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid as-of join. \nFor more details see [AS\\_OF\\_JOIN](https://docs.databricks.com/error-messages/as-of-join-error-class.html) \n### AVRO\\_DEFAULT\\_VALUES\\_UNSUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe use of default values is not supported when`rescuedDataColumn` is enabled. You may be able to remove this check by setting `spark.databricks.sql.avro.rescuedDataBlockUserDefinedSchemaDefaultValue` to false, but the default values will not apply and null values will still be used. \n### AVRO\\_INCOMPATIBLE\\_READ\\_TYPE \n[SQLSTATE: 22KD3](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot convert Avro `` to SQL `` because the original encoded data type is ``, however you\u2019re trying to read the field as ``, which would lead to an incorrect answer. \nTo allow reading this field, enable the SQL configuration: \u201cspark.sql.legacy.avro.allowIncompatibleSchema\u201d. \n### AVRO\\_POSITIONAL\\_FIELD\\_MATCHING\\_UNSUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe use of positional field matching is not supported when either `rescuedDataColumn` or `failOnUnknownFields` is enabled. Remove these options to proceed. \n### BATCH\\_METADATA\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to find batch ``. \n### BIGQUERY\\_OPTIONS\\_ARE\\_MUTUALLY\\_EXCLUSIVE \n[SQLSTATE: 42616](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nBigQuery connection credentials must be specified with either the \u2018GoogleServiceAccountKeyJson\u2019 parameter or all of \u2018projectId\u2019, \u2018OAuthServiceAcctEmail\u2019, \u2018OAuthPvtKey\u2019 \n### BINARY\\_ARITHMETIC\\_OVERFLOW \n[SQLSTATE: 22003](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n`` `` `` caused overflow. \n### BUILT\\_IN\\_CATALOG \n[SQLSTATE: 42832](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` doesn\u2019t support built-in catalogs. \n### CALL\\_ON\\_STREAMING\\_DATASET\\_UNSUPPORTED \n[SQLSTATE: 42KDE](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe method `` can not be called on streaming Dataset/DataFrame. \n### CANNOT\\_ALTER\\_PARTITION\\_COLUMN \n[SQLSTATE: 428FR](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nALTER TABLE (ALTER|CHANGE) COLUMN is not supported for partition columns, but found the partition column `` in the table ``. \n### CANNOT\\_CAST\\_DATATYPE \n[SQLSTATE: 42846](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot cast `` to ``. \n### CANNOT\\_CONVERT\\_PROTOBUF\\_FIELD\\_TYPE\\_TO\\_SQL\\_TYPE \n[SQLSTATE: 42846](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot convert Protobuf `` to SQL `` because schema is incompatible (protobufType = ``, sqlType = ``). \n### CANNOT\\_CONVERT\\_PROTOBUF\\_MESSAGE\\_TYPE\\_TO\\_SQL\\_TYPE \n[SQLSTATE: 42846](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to convert `` of Protobuf to SQL type ``. \n### CANNOT\\_CONVERT\\_SQL\\_TYPE\\_TO\\_PROTOBUF\\_FIELD\\_TYPE \n[SQLSTATE: 42846](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot convert SQL `` to Protobuf `` because schema is incompatible (protobufType = ``, sqlType = ``). \n### CANNOT\\_CONVERT\\_SQL\\_VALUE\\_TO\\_PROTOBUF\\_ENUM\\_TYPE \n[SQLSTATE: 42846](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot convert SQL `` to Protobuf `` because `` is not in defined values for enum: ``. \n### CANNOT\\_COPY\\_STATE \n[SQLSTATE: 0AKD0](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot copy catalog state like current database and temporary views from Unity Catalog to a legacy catalog. \n### [CANNOT\\_CREATE\\_DATA\\_SOURCE\\_TABLE](https://docs.databricks.com/error-messages/cannot-create-data-source-table-error-class.html) \n[SQLSTATE: 42KDE](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to create data source table ``: \nFor more details see [CANNOT\\_CREATE\\_DATA\\_SOURCE\\_TABLE](https://docs.databricks.com/error-messages/cannot-create-data-source-table-error-class.html) \n### CANNOT\\_DECODE\\_URL \n[SQLSTATE: 22546](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe provided URL cannot be decoded: ``. Please ensure that the URL is properly formatted and try again. \n### CANNOT\\_DELETE\\_SYSTEM\\_OWNED \n[SQLSTATE: 42832](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nSystem owned `` cannot be deleted. \n### CANNOT\\_DROP\\_AMBIGUOUS\\_CONSTRAINT \n[SQLSTATE: 42K0C](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot drop the constraint with the name `` shared by a CHECK constraint \nand a PRIMARY KEY or FOREIGN KEY constraint. You can drop the PRIMARY KEY or \nFOREIGN KEY constraint by queries: \n* ALTER TABLE .. DROP PRIMARY KEY or\n* ALTER TABLE .. DROP FOREIGN KEY .. \n### CANNOT\\_ESTABLISH\\_CONNECTION \n[SQLSTATE: 08001](https://docs.databricks.com/error-messages/sqlstates.html#class-08-connection-exception) \nCannot establish connection to remote `` database. Please check connection information and credentials e.g. host, port, user, password and database options. \\*\\* If you believe the information is correct, please check your workspace\u2019s network setup and ensure it does not have outbound restrictions to the host. Please also check that the host does not block inbound connections from the network where the workspace\u2019s Spark clusters are deployed. \\*\\* Detailed error message: ``. \n### CANNOT\\_ESTABLISH\\_CONNECTION\\_SERVERLESS \n[SQLSTATE: 08001](https://docs.databricks.com/error-messages/sqlstates.html#class-08-connection-exception) \nCannot establish connection to remote `` database. Please check connection information and credentials e.g. host, port, user, password and database options. \\*\\* If you believe the information is correct, please allow inbound traffic from the Internet to your host, as you are using Serverless Compute. If your network policies do not allow inbound Internet traffic, please use non Serverless Compute, or you may reach out to your Databricks representative to learn about Serverless Private Networking. \\*\\* Detailed error message: ``. \n### CANNOT\\_INVOKE\\_IN\\_TRANSFORMATIONS \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDataset transformations and actions can only be invoked by the driver, not inside of other Dataset transformations; for example, dataset1.map(x => dataset2.values.count() \\* x) is invalid because the values transformation and count action cannot be performed inside of the dataset1.map transformation. For more information, see SPARK-28702. \n### CANNOT\\_LOAD\\_FUNCTION\\_CLASS \n[SQLSTATE: 46103](https://docs.databricks.com/error-messages/sqlstates.html#class-46-java-ddl-1) \nCannot load class `` when registering the function ``, please make sure it is on the classpath. \n### CANNOT\\_LOAD\\_PROTOBUF\\_CLASS \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCould not load Protobuf class with name ``. ``. \n### [CANNOT\\_LOAD\\_STATE\\_STORE](https://docs.databricks.com/error-messages/cannot-load-state-store-error-class.html) \n[SQLSTATE: 58030](https://docs.databricks.com/error-messages/sqlstates.html#class-58-system-error) \nAn error occurred during loading state. \nFor more details see [CANNOT\\_LOAD\\_STATE\\_STORE](https://docs.databricks.com/error-messages/cannot-load-state-store-error-class.html) \n### CANNOT\\_MERGE\\_INCOMPATIBLE\\_DATA\\_TYPE \n[SQLSTATE: 42825](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to merge incompatible data types `` and ``. Please check the data types of the columns being merged and ensure that they are compatible. If necessary, consider casting the columns to compatible data types before attempting the merge. \n### CANNOT\\_MERGE\\_SCHEMAS \n[SQLSTATE: 42KD9](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed merging schemas: \nInitial schema: \n`` \nSchema that cannot be merged with the initial schema: \n``. \n### CANNOT\\_MODIFY\\_CONFIG \n[SQLSTATE: 46110](https://docs.databricks.com/error-messages/sqlstates.html#class-46-java-ddl-1) \nCannot modify the value of the Spark config: ``. \nSee also \u2019. \n### CANNOT\\_PARSE\\_DECIMAL \n[SQLSTATE: 22018](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot parse decimal. Please ensure that the input is a valid number with optional decimal point or comma separators. \n### CANNOT\\_PARSE\\_INTERVAL \n[SQLSTATE: 22006](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nUnable to parse ``. Please ensure that the value provided is in a valid format for defining an interval. You can reference the documentation for the correct format. If the issue persists, please double check that the input value is not null or empty and try again. \n### CANNOT\\_PARSE\\_JSON\\_FIELD \n[SQLSTATE: 2203G](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot parse the field name `` and the value `` of the JSON token type `` to target Spark data type ``. \n### CANNOT\\_PARSE\\_PROTOBUF\\_DESCRIPTOR \n[SQLSTATE: 22018](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nError parsing descriptor bytes into Protobuf FileDescriptorSet. \n### CANNOT\\_PARSE\\_TIMESTAMP \n[SQLSTATE: 22007](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n``. If necessary set `` to \u201cfalse\u201d to bypass this error. \n### CANNOT\\_READ\\_ARCHIVED\\_FILE \n[SQLSTATE: KD003](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nCannot read file at path `` because it has been archived. Please adjust your query filters to exclude archived files. \n### [CANNOT\\_READ\\_FILE](https://docs.databricks.com/error-messages/cannot-read-file-error-class.html) \n[SQLSTATE: KD003](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nCannot read `` file at path: ``. \nFor more details see [CANNOT\\_READ\\_FILE](https://docs.databricks.com/error-messages/cannot-read-file-error-class.html) \n### CANNOT\\_READ\\_FILE\\_FOOTER \n[SQLSTATE: KD001](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nCould not read footer for file: ``. Please ensure that the file is in either ORC or Parquet format. \nIf not, please convert it to a valid format. If the file is in the valid format, please check if it is corrupt. \nIf it is, you can choose to either ignore it or fix the corruption. \n### CANNOT\\_READ\\_SENSITIVE\\_KEY\\_FROM\\_SECURE\\_PROVIDER \n[SQLSTATE: 42501](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot read sensitive key \u2018``\u2019 from secure provider. \n### CANNOT\\_RECOGNIZE\\_HIVE\\_TYPE \n[SQLSTATE: 429BB](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot recognize hive type string: ``, column: ``. The specified data type for the field cannot be recognized by Spark SQL. Please check the data type of the specified field and ensure that it is a valid Spark SQL data type. Refer to the Spark SQL documentation for a list of valid data types and their format. If the data type is correct, please ensure that you are using a supported version of Spark SQL. \n### CANNOT\\_REFERENCE\\_UC\\_IN\\_HMS \n[SQLSTATE: 0AKD0](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot reference a Unity Catalog `` in Hive Metastore objects. \n### CANNOT\\_RENAME\\_ACROSS\\_CATALOG \n[SQLSTATE: 0AKD0](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nRenaming a `` across catalogs is not allowed. \n### CANNOT\\_RENAME\\_ACROSS\\_SCHEMA \n[SQLSTATE: 0AKD0](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nRenaming a `` across schemas is not allowed. \n### CANNOT\\_RESOLVE\\_DATAFRAME\\_COLUMN \n[SQLSTATE: 42704](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot resolve dataframe column ``. It\u2019s probably because of illegal references like `df1.select(df2.col(\"a\"))`. \n### CANNOT\\_RESOLVE\\_STAR\\_EXPAND \n[SQLSTATE: 42704](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot resolve ``.\\* given input columns ``. Please check that the specified table or struct exists and is accessible in the input columns. \n### CANNOT\\_RESTORE\\_PERMISSIONS\\_FOR\\_PATH \n[SQLSTATE: 58030](https://docs.databricks.com/error-messages/sqlstates.html#class-58-system-error) \nFailed to set permissions on created path `` back to ``. \n### CANNOT\\_SAVE\\_VARIANT \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot save variant data type into external storage. \n### CANNOT\\_SHALLOW\\_CLONE\\_ACROSS\\_UC\\_AND\\_HMS \n[SQLSTATE: 0AKD0](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot shallow-clone tables across Unity Catalog and Hive Metastore. \n### CANNOT\\_SHALLOW\\_CLONE\\_NESTED \n[SQLSTATE: 0AKUC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot shallow-clone a table `` that is already a shallow clone. \n### CANNOT\\_SHALLOW\\_CLONE\\_NON\\_UC\\_MANAGED\\_TABLE\\_AS\\_SOURCE\\_OR\\_TARGET \n[SQLSTATE: 0AKUC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nShallow clone is only supported for the MANAGED table type. The table `
` is not MANAGED table. \n### [CANNOT\\_UPDATE\\_FIELD](https://docs.databricks.com/error-messages/cannot-update-field-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot update `
` field `` type: \nFor more details see [CANNOT\\_UPDATE\\_FIELD](https://docs.databricks.com/error-messages/cannot-update-field-error-class.html) \n### CANNOT\\_UP\\_CAST\\_DATATYPE \n[SQLSTATE: 42846](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot up cast `` from `` to ``. \n`
` \n### CANNOT\\_VALIDATE\\_CONNECTION \n[SQLSTATE: 08000](https://docs.databricks.com/error-messages/sqlstates.html#class-08-connection-exception) \nValidation of `` connection is not supported. Please contact Databricks support for alternative solutions, or set \u201cspark.databricks.testConnectionBeforeCreation\u201d to \u201cfalse\u201d to skip connection testing before creating a connection object. \n### [CANNOT\\_WRITE\\_STATE\\_STORE](https://docs.databricks.com/error-messages/cannot-write-state-store-error-class.html) \n[SQLSTATE: 58030](https://docs.databricks.com/error-messages/sqlstates.html#class-58-system-error) \nError writing state store files for provider ``. \nFor more details see [CANNOT\\_WRITE\\_STATE\\_STORE](https://docs.databricks.com/error-messages/cannot-write-state-store-error-class.html) \n### [CAST\\_INVALID\\_INPUT](https://docs.databricks.com/error-messages/cast-invalid-input-error-class.html) \n[SQLSTATE: 22018](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe value `` of the type `` cannot be cast to `` because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set `` to \u201cfalse\u201d to bypass this error. \nFor more details see [CAST\\_INVALID\\_INPUT](https://docs.databricks.com/error-messages/cast-invalid-input-error-class.html) \n### CAST\\_OVERFLOW \n[SQLSTATE: 22003](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe value `` of the type `` cannot be cast to `` due to an overflow. Use `try_cast` to tolerate overflow and return NULL instead. If necessary set `` to \u201cfalse\u201d to bypass this error. \n### CAST\\_OVERFLOW\\_IN\\_TABLE\\_INSERT \n[SQLSTATE: 22003](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFail to assign a value of `` type to the `` type column or variable `` due to an overflow. Use `try_cast` on the input value to tolerate overflow and return NULL instead. \n### CHECKPOINT\\_RDD\\_BLOCK\\_ID\\_NOT\\_FOUND \n[SQLSTATE: 56000](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nCheckpoint block `` not found! \nEither the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. \nIf this problem persists, you may consider using `rdd.checkpoint()` instead, which is slower than local checkpointing but more fault-tolerant. \n### CLASS\\_UNSUPPORTED\\_BY\\_MAP\\_OBJECTS \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`MapObjects` does not support the class `` as resulting collection. \n### CLEANROOM\\_COMMANDS\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nClean Room commands are not supported \n### CLEANROOM\\_INVALID\\_SHARED\\_DATA\\_OBJECT\\_NAME \n[SQLSTATE: 42K05](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid name to reference a `` inside a Clean Room. Use a ``\u2019s name inside the clean room following the format of [catalog].[schema].[``]. \nIf you are unsure about what name to use, you can run \u201cSHOW ALL IN CLEANROOM [clean\\_room]\u201d and use the value in the \u201cname\u201d column. \n### CLOUD\\_FILE\\_SOURCE\\_FILE\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nA file notification was received for file: `` but it does not exist anymore. Please ensure that files are not deleted before they are processed. To continue your stream, you can set the Spark SQL configuration `` to true. \n### CODEC\\_NOT\\_AVAILABLE \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nThe codec `` is not available. Consider to set the config `` to ``. \n### CODEC\\_SHORT\\_NAME\\_NOT\\_FOUND \n[SQLSTATE: 42704](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot find a short name for the codec ``. \n### COLLATION\\_INVALID\\_NAME \n[SQLSTATE: 42704](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe value `` does not represent a correct collation name. Suggested valid collation name: [``]. \n### [COLLECTION\\_SIZE\\_LIMIT\\_EXCEEDED](https://docs.databricks.com/error-messages/collection-size-limit-exceeded-error-class.html) \n[SQLSTATE: 54000](https://docs.databricks.com/error-messages/sqlstates.html#class-54-program-limit-exceeded) \nCan\u2019t create array with `` elements which exceeding the array size limit ``, \nFor more details see [COLLECTION\\_SIZE\\_LIMIT\\_EXCEEDED](https://docs.databricks.com/error-messages/collection-size-limit-exceeded-error-class.html) \n### COLUMN\\_ALIASES\\_NOT\\_ALLOWED \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nColumn aliases are not allowed in ``. \n### COLUMN\\_ALREADY\\_EXISTS \n[SQLSTATE: 42711](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe column `` already exists. Choose another name or rename the existing column. \n### COLUMN\\_MASKS\\_CHECK\\_CONSTRAINT\\_UNSUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCreating CHECK constraint on table `` with column mask policies is not supported. \n### COLUMN\\_MASKS\\_DUPLICATE\\_USING\\_COLUMN\\_NAME \n[SQLSTATE: 42734](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nA `` statement attempted to assign a column mask policy to a column which included two or more other referenced columns in the USING COLUMNS list with the same name ``, which is invalid. \n### [COLUMN\\_MASKS\\_FEATURE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/column-masks-feature-not-supported-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nColumn mask policies for `` are not supported: \nFor more details see [COLUMN\\_MASKS\\_FEATURE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/column-masks-feature-not-supported-error-class.html) \n### COLUMN\\_MASKS\\_INCOMPATIBLE\\_SCHEMA\\_CHANGE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnable to `` `` from table `` because it\u2019s referenced in a column mask policy for column ``. The table owner must remove or alter this policy before proceeding. \n### COLUMN\\_MASKS\\_MERGE\\_UNSUPPORTED\\_SOURCE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nMERGE INTO operations do not support column mask policies in source table ``. \n### COLUMN\\_MASKS\\_MERGE\\_UNSUPPORTED\\_TARGET \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nMERGE INTO operations do not support writing into table `` with column mask policies. \n### COLUMN\\_MASKS\\_MULTI\\_PART\\_TARGET\\_COLUMN\\_NAME \n[SQLSTATE: 42K05](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThis statement attempted to assign a column mask policy to a column `` with multiple name parts, which is invalid. \n### COLUMN\\_MASKS\\_MULTI\\_PART\\_USING\\_COLUMN\\_NAME \n[SQLSTATE: 42K05](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThis statement attempted to assign a column mask policy to a column and the USING COLUMNS list included the name `` with multiple name parts, which is invalid. \n### COLUMN\\_MASKS\\_NOT\\_ENABLED \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nSupport for defining column masks is not enabled \n### COLUMN\\_MASKS\\_REQUIRE\\_UNITY\\_CATALOG \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nColumn mask policies are only supported in Unity Catalog. \n### COLUMN\\_MASKS\\_TABLE\\_CLONE\\_SOURCE\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` clone from table `` with column mask policies is not supported. \n### COLUMN\\_MASKS\\_TABLE\\_CLONE\\_TARGET\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` clone to table `` with column mask policies is not supported. \n### COLUMN\\_MASKS\\_UNSUPPORTED\\_CONSTANT\\_AS\\_PARAMETER \n[SQLSTATE: 0AKD1](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUsing a constant as a parameter in a column mask policy is not supported. Please update your SQL command to remove the constant from the column mask definition and then retry the command again. \n### COLUMN\\_MASKS\\_UNSUPPORTED\\_PROVIDER \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nFailed to execute `` command because assigning column mask policies is not supported for target data source with table provider: \u201c``\u201d. \n### COLUMN\\_MASKS\\_UNSUPPORTED\\_SUBQUERY \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot perform `` for table `` because it contains one or more column mask policies with subquery expression(s), which are not yet supported. Please contact the owner of the table to update the column mask policies in order to continue. \n### COLUMN\\_MASKS\\_USING\\_COLUMN\\_NAME\\_SAME\\_AS\\_TARGET\\_COLUMN \n[SQLSTATE: 42734](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe column `` had the same name as the target column, which is invalid; please remove the column from the USING COLUMNS list and retry the command. \n### COLUMN\\_NOT\\_DEFINED\\_IN\\_TABLE \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` column `` is not defined in table ``, defined table columns are: ``. \n### COLUMN\\_NOT\\_FOUND \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe column `` cannot be found. Verify the spelling and correctness of the column name according to the SQL config ``. \n### COMMA\\_PRECEDING\\_CONSTRAINT\\_ERROR \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnexpected \u2018,\u2019 before constraint(s) definition. Ensure that the constraint clause does not start with a comma when columns (and expectations) are not defined. \n### COMPARATOR\\_RETURNS\\_NULL \n[SQLSTATE: 22004](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe comparator has returned a NULL for a comparison between `` and ``. \nIt should return a positive integer for \u201cgreater than\u201d, 0 for \u201cequal\u201d and a negative integer for \u201cless than\u201d. \nTo revert to deprecated behavior where NULL is treated as 0 (equal), you must set \u201cspark.sql.legacy.allowNullComparisonResultInArraySort\u201d to \u201ctrue\u201d. \n### CONCURRENT\\_QUERY \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nAnother instance of this query [id: ``] was just started by a concurrent session [existing runId: `` new runId: ``]. \n### CONCURRENT\\_STREAM\\_LOG\\_UPDATE \n[SQLSTATE: 40000](https://docs.databricks.com/error-messages/sqlstates.html#class-40-transaction-rollback) \nConcurrent update to the log. Multiple streaming jobs detected for ``. \nPlease make sure only one streaming job runs on a specific checkpoint location at a time. \n### CONFLICTING\\_PROVIDER \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe specified provider `` is inconsistent with the existing catalog provider ``. Please use \u2018USING ``\u2019 and retry the command. \n### [CONNECT](https://docs.databricks.com/error-messages/connect-error-class.html) \n[SQLSTATE: 56K00](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nGeneric Spark Connect error. \nFor more details see [CONNECT](https://docs.databricks.com/error-messages/connect-error-class.html) \n### CONNECTION\\_ALREADY\\_EXISTS \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot create connection `` because it already exists. \nChoose a different name, drop or replace the existing connection, or add the IF NOT EXISTS clause to tolerate pre-existing connections. \n### CONNECTION\\_NAME\\_CANNOT\\_BE\\_EMPTY \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot execute this command because the connection name must be non-empty. \n### CONNECTION\\_NOT\\_FOUND \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot execute this command because the connection name `` was not found. \n### CONNECTION\\_OPTION\\_NOT\\_SUPPORTED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nConnections of type \u2018``\u2019 do not support the following option(s): ``. Supported options: ``. \n### CONNECTION\\_TYPE\\_NOT\\_SUPPORTED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot create connection of type \u2018``. Supported connection types: ``. \n### CONSTRAINTS\\_REQUIRE\\_UNITY\\_CATALOG \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nTable constraints are only supported in Unity Catalog. \n### CONVERSION\\_INVALID\\_INPUT \n[SQLSTATE: 22018](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe value `` (``) cannot be converted to `` because it is malformed. Correct the value as per the syntax, or change its format. Use `` to tolerate malformed input and return NULL instead. \n### COPY\\_INTO\\_CREDENTIALS\\_NOT\\_ALLOWED\\_ON \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nInvalid scheme ``. COPY INTO source credentials currently only supports s3/s3n/s3a/wasbs/abfss. \n### COPY\\_INTO\\_CREDENTIALS\\_REQUIRED \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCOPY INTO source credentials must specify ``. \n### COPY\\_INTO\\_DUPLICATED\\_FILES\\_COPY\\_NOT\\_ALLOWED \n[SQLSTATE: 25000](https://docs.databricks.com/error-messages/sqlstates.html#class-25-invalid-transaction-state) \nDuplicated files were committed in a concurrent COPY INTO operation. Please try again later. \n### COPY\\_INTO\\_ENCRYPTION\\_NOT\\_ALLOWED\\_ON \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nInvalid scheme ``. COPY INTO source encryption currently only supports s3/s3n/s3a/abfss. \n### COPY\\_INTO\\_ENCRYPTION\\_NOT\\_SUPPORTED\\_FOR\\_AZURE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCOPY INTO encryption only supports ADLS Gen2, or abfss:// file scheme \n### COPY\\_INTO\\_ENCRYPTION\\_REQUIRED \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCOPY INTO source encryption must specify \u2018``\u2019. \n### COPY\\_INTO\\_ENCRYPTION\\_REQUIRED\\_WITH\\_EXPECTED \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid encryption option ``. COPY INTO source encryption must specify \u2018``\u2019 = \u2018``\u2019. \n### COPY\\_INTO\\_NON\\_BLIND\\_APPEND\\_NOT\\_ALLOWED \n[SQLSTATE: 25000](https://docs.databricks.com/error-messages/sqlstates.html#class-25-invalid-transaction-state) \nCOPY INTO other than appending data is not allowed to run concurrently with other transactions. Please try again later. \n### COPY\\_INTO\\_ROCKSDB\\_MAX\\_RETRY\\_EXCEEDED \n[SQLSTATE: 25000](https://docs.databricks.com/error-messages/sqlstates.html#class-25-invalid-transaction-state) \nCOPY INTO failed to load its state, maximum retries exceeded. \n### COPY\\_INTO\\_SCHEMA\\_MISMATCH\\_WITH\\_TARGET\\_TABLE \n[SQLSTATE: 42KDG](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nA schema mismatch was detected while copying into the Delta table (Table: `
`). \nThis may indicate an issue with the incoming data, or the Delta table schema can be evolved automatically according to the incoming data by setting: \nCOPY\\_OPTIONS (\u2018mergeSchema\u2019 = \u2018true\u2019) \nSchema difference: \n`` \n### COPY\\_INTO\\_SOURCE\\_FILE\\_FORMAT\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe format of the source files must be one of CSV, JSON, AVRO, ORC, PARQUET, TEXT, or BINARYFILE. Using COPY INTO on Delta tables as the source is not supported as duplicate data may be ingested after OPTIMIZE operations. This check can be turned off by running the SQL command `set spark.databricks.delta.copyInto.formatCheck.enabled = false`. \n### COPY\\_INTO\\_SOURCE\\_SCHEMA\\_INFERENCE\\_FAILED \n[SQLSTATE: 42KD9](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe source directory did not contain any parsable files of type ``. Please check the contents of \u2018``\u2019. \n### [COPY\\_INTO\\_SYNTAX\\_ERROR](https://docs.databricks.com/error-messages/copy-into-syntax-error-error-class.html) \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to parse the COPY INTO command. \nFor more details see [COPY\\_INTO\\_SYNTAX\\_ERROR](https://docs.databricks.com/error-messages/copy-into-syntax-error-error-class.html) \n### CREATE\\_OR\\_REFRESH\\_MV\\_ST\\_ASYNC \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot CREATE OR REFRESH Materialized Views or Streaming Tables with ASYNC specified. Please remove ASYNC from the CREATE OR REFRESH statement or use REFRESH ASYNC to refresh existing Materialized Views or Streaming Tables asynchronously. \n### CREATE\\_PERMANENT\\_VIEW\\_WITHOUT\\_ALIAS \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nNot allowed to create the permanent view `` without explicitly assigning an alias for the expression ``. \n### CREATE\\_TABLE\\_COLUMN\\_DESCRIPTOR\\_DUPLICATE \n[SQLSTATE: 42710](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCREATE TABLE column `` specifies descriptor \u201c``\u201d more than once, which is invalid. \n### [CREATE\\_VIEW\\_COLUMN\\_ARITY\\_MISMATCH](https://docs.databricks.com/error-messages/create-view-column-arity-mismatch-error-class.html) \n[SQLSTATE: 21S01](https://docs.databricks.com/error-messages/sqlstates.html#class-21-cardinality-violation) \nCannot create view ``, the reason is \nFor more details see [CREATE\\_VIEW\\_COLUMN\\_ARITY\\_MISMATCH](https://docs.databricks.com/error-messages/create-view-column-arity-mismatch-error-class.html) \n### CREDENTIAL\\_MISSING \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPlease provide credentials when creating or updating external locations. \n### CSV\\_ENFORCE\\_SCHEMA\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe CSV option `enforceSchema` cannot be set when using `rescuedDataColumn` or `failOnUnknownFields`, as columns are read by name rather than ordinal. \n### CYCLIC\\_FUNCTION\\_REFERENCE \n[SQLSTATE: 42887](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCyclic function reference detected: ``. \n### DATABRICKS\\_DELTA\\_NOT\\_ENABLED \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nDatabricks Delta is not enabled in your account.`` \n### [DATATYPE\\_MISMATCH](https://docs.databricks.com/error-messages/datatype-mismatch-error-class.html) \n[SQLSTATE: 42K09](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot resolve `` due to data type mismatch: \nFor more details see [DATATYPE\\_MISMATCH](https://docs.databricks.com/error-messages/datatype-mismatch-error-class.html) \n### DATATYPE\\_MISSING\\_SIZE \n[SQLSTATE: 42K01](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nDataType `` requires a length parameter, for example ``(10). Please specify the length. \n### DATA\\_LINEAGE\\_SECURE\\_VIEW\\_LEAF\\_NODE\\_HAS\\_NO\\_RELATION \n[SQLSTATE: 25000](https://docs.databricks.com/error-messages/sqlstates.html#class-25-invalid-transaction-state) \nWrite Lineage unsuccessful: missing corresponding relation with policies for CLM/RLS. \n### DATA\\_SOURCE\\_ALREADY\\_EXISTS \n[SQLSTATE: 42710](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nData source \u2018``\u2019 already exists. Please choose a different name for the new data source. \n### DATA\\_SOURCE\\_NOT\\_EXIST \n[SQLSTATE: 42704](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nData source \u2018``\u2019 not found. Please make sure the data source is registered. \n### DATA\\_SOURCE\\_NOT\\_FOUND \n[SQLSTATE: 42K02](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to find the data source: ``. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version. \n### DATA\\_SOURCE\\_OPTION\\_CONTAINS\\_INVALID\\_CHARACTERS \n[SQLSTATE: 42602](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nOption `
`\u2019s column `` with type `` to `` with type ``. \n### NOT\\_SUPPORTED\\_COMMAND\\_FOR\\_V2\\_TABLE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is not supported for v2 tables. \n### NOT\\_SUPPORTED\\_COMMAND\\_WITHOUT\\_HIVE\\_SUPPORT \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is not supported, if you want to enable it, please set \u201cspark.sql.catalogImplementation\u201d to \u201chive\u201d. \n### [NOT\\_SUPPORTED\\_IN\\_JDBC\\_CATALOG](https://docs.databricks.com/error-messages/not-supported-in-jdbc-catalog-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nNot supported command in JDBC catalog: \nFor more details see [NOT\\_SUPPORTED\\_IN\\_JDBC\\_CATALOG](https://docs.databricks.com/error-messages/not-supported-in-jdbc-catalog-error-class.html) \n### NOT\\_SUPPORTED\\_WITH\\_DB\\_SQL \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is not supported on a SQL ``. \n### NO\\_DEFAULT\\_COLUMN\\_VALUE\\_AVAILABLE \n[SQLSTATE: 42608](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCan\u2019t determine the default value for `` since it is not nullable and it has no default value. \n### NO\\_HANDLER\\_FOR\\_UDAF \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nNo handler for UDAF \u2018``\u2019. Use sparkSession.udf.register(\u2026) instead. \n### NO\\_MERGE\\_ACTION\\_SPECIFIED \n[SQLSTATE: 42K0E](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \ndf.mergeInto needs to be followed by at least one of whenMatched/whenNotMatched/whenNotMatchedBySource. \n### NO\\_PARENT\\_EXTERNAL\\_LOCATION\\_FOR\\_PATH \nSQLSTATE: none assigned \nNo parent external location was found for path \u2018``\u2019. Please create an external location on one of the parent paths and then retry the query or command again. \n### NO\\_SQL\\_TYPE\\_IN\\_PROTOBUF\\_SCHEMA \n[SQLSTATE: 42S22](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot find `` in Protobuf schema. \n### NO\\_STORAGE\\_LOCATION\\_FOR\\_TABLE \nSQLSTATE: none assigned \nNo storage location was found for table \u2018``\u2019 when generating table credentials. Please verify the table type and the table location URL and then retry the query or command again. \n### NO\\_SUCH\\_CATALOG\\_EXCEPTION \nSQLSTATE: none assigned \nCatalog \u2018``\u2019 was not found. Please verify the catalog name and then retry the query or command again. \n### NO\\_SUCH\\_CLEANROOM\\_EXCEPTION \nSQLSTATE: none assigned \nThe clean room \u2018``\u2019 does not exist. Please verify that the clean room name is spelled correctly and matches the name of a valid existing clean room and then retry the query or command again. \n### NO\\_SUCH\\_EXTERNAL\\_LOCATION\\_EXCEPTION \nSQLSTATE: none assigned \nThe external location \u2018``\u2019 does not exist. Please verify that the external location name is correct and then retry the query or command again. \n### NO\\_SUCH\\_METASTORE\\_EXCEPTION \nSQLSTATE: none assigned \nThe metastore was not found. Please ask your account administrator to assign a metastore to the current workspace and then retry the query or command again. \n### NO\\_SUCH\\_PROVIDER\\_EXCEPTION \nSQLSTATE: none assigned \nThe share provider \u2018``\u2019 does not exist. Please verify the share provider name is spelled correctly and matches the name of a valid existing provider name and then retry the query or command again. \n### NO\\_SUCH\\_RECIPIENT\\_EXCEPTION \nSQLSTATE: none assigned \nThe recipient \u2018``\u2019 does not exist. Please verify that the recipient name is spelled correctly and matches the name of a valid existing recipient and then retry the query or command again. \n### NO\\_SUCH\\_SHARE\\_EXCEPTION \nSQLSTATE: none assigned \nThe share \u2018``\u2019 does not exist. Please verify that the share name is spelled correctly and matches the name of a valid existing share and then retry the query or command again. \n### NO\\_SUCH\\_STORAGE\\_CREDENTIAL\\_EXCEPTION \nSQLSTATE: none assigned \nThe storage credential \u2018``\u2019 does not exist. Please verify that the storage credential name is spelled correctly and matches the name of a valid existing storage credential and then retry the query or command again. \n### NO\\_SUCH\\_USER\\_EXCEPTION \nSQLSTATE: none assigned \nThe user \u2018``\u2019 does not exist. Please verify that the user to whom you grant permission or alter ownership is spelled correctly and matches the name of a valid existing user and then retry the query or command again. \n### NO\\_UDF\\_INTERFACE \n[SQLSTATE: 38000](https://docs.databricks.com/error-messages/sqlstates.html#class-38-external-routine-exception) \nUDF class `` doesn\u2019t implement any UDF interface. \n### NULLABLE\\_COLUMN\\_OR\\_FIELD \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nColumn or field `` is nullable while it\u2019s required to be non-nullable. \n### NULLABLE\\_ROW\\_ID\\_ATTRIBUTES \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nRow ID attributes cannot be nullable: ``. \n### NULL\\_MAP\\_KEY \n[SQLSTATE: 2200E](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot use null as map key. \n### NUMERIC\\_OUT\\_OF\\_SUPPORTED\\_RANGE \n[SQLSTATE: 22003](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe value `` cannot be interpreted as a numeric since it has more than 38 digits. \n### NUMERIC\\_VALUE\\_OUT\\_OF\\_RANGE \n[SQLSTATE: 22003](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n`` cannot be represented as Decimal(``, ``). If necessary set `` to \u201cfalse\u201d to bypass this error, and return NULL instead. \n### NUM\\_COLUMNS\\_MISMATCH \n[SQLSTATE: 42826](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` can only be performed on inputs with the same number of columns, but the first input has `` columns and the `` input has `` columns. \n### NUM\\_TABLE\\_VALUE\\_ALIASES\\_MISMATCH \n[SQLSTATE: 42826](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nNumber of given aliases does not match number of output columns. \nFunction name: ``; number of aliases: ``; number of output columns: ``. \n### OAUTH\\_CUSTOM\\_IDENTITY\\_CLAIM\\_NOT\\_PROVIDED \n[SQLSTATE: 22KD2](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nNo custom identity claim was provided. \n### ONLY\\_SECRET\\_FUNCTION\\_SUPPORTED\\_HERE \n[SQLSTATE: 42K0E](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCalling function `` is not supported in this ``; `` supported here. \n### OPERATION\\_CANCELED \n[SQLSTATE: HY008](https://docs.databricks.com/error-messages/sqlstates.html#class-hy-cli-specific-condition) \nOperation has been canceled. \n### OPERATION\\_REQUIRES\\_UNITY\\_CATALOG \n[SQLSTATE: 0AKUD](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nOperation `` requires Unity Catalog enabled. \n### OP\\_NOT\\_SUPPORTED\\_READ\\_ONLY \n[SQLSTATE: 42KD1](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` is not supported in read-only session mode. \n### ORDER\\_BY\\_POS\\_OUT\\_OF\\_RANGE \n[SQLSTATE: 42805](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nORDER BY position `` is not in select list (valid range is [1, ``]). \n### PARSE\\_EMPTY\\_STATEMENT \n[SQLSTATE: 42617](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nSyntax error, unexpected empty statement. \n### PARSE\\_SYNTAX\\_ERROR \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nSyntax error at or near `` ``. \n### PARTITIONS\\_ALREADY\\_EXIST \n[SQLSTATE: 428FT](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot ADD or RENAME TO partition(s) `` in table `` because they already exist. \nChoose a different name, drop the existing partition, or add the IF NOT EXISTS clause to tolerate a pre-existing partition. \n### PARTITIONS\\_NOT\\_FOUND \n[SQLSTATE: 428FT](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe partition(s) `` cannot be found in table ``. \nVerify the partition specification and table name. \nTo tolerate the error on drop use ALTER TABLE \u2026 DROP IF EXISTS PARTITION. \n### PARTITION\\_LOCATION\\_ALREADY\\_EXISTS \n[SQLSTATE: 42K04](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPartition location `` already exists in table ``. \n### PARTITION\\_LOCATION\\_IS\\_NOT\\_UNDER\\_TABLE\\_DIRECTORY \n[SQLSTATE: 42KD5](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to execute the ALTER TABLE SET PARTITION LOCATION statement, because the \npartition location `` is not under the table directory `
`. \nTo fix it, please set the location of partition to a subdirectory of `
`. \n### PARTITION\\_METADATA \n[SQLSTATE: 0AKUC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is not allowed on table `` since storing partition metadata is not supported in Unity Catalog. \n### PATH\\_ALREADY\\_EXISTS \n[SQLSTATE: 42K04](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPath `` already exists. Set mode as \u201coverwrite\u201d to overwrite the existing path. \n### PATH\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPath does not exist: ``. \n### PIVOT\\_VALUE\\_DATA\\_TYPE\\_MISMATCH \n[SQLSTATE: 42K09](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid pivot value \u2018``\u2019: value data type `` does not match pivot column data type ``. \n### PROTOBUF\\_DEPENDENCY\\_NOT\\_FOUND \n[SQLSTATE: 42K0G](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCould not find dependency: ``. \n### PROTOBUF\\_DESCRIPTOR\\_FILE\\_NOT\\_FOUND \n[SQLSTATE: 42K0G](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nError reading Protobuf descriptor file at path: ``. \n### PROTOBUF\\_FIELD\\_MISSING \n[SQLSTATE: 42K0G](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nSearching for `` in Protobuf schema at `` gave `` matches. Candidates: ``. \n### PROTOBUF\\_FIELD\\_MISSING\\_IN\\_SQL\\_SCHEMA \n[SQLSTATE: 42K0G](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFound `` in Protobuf schema but there is no match in the SQL schema. \n### PROTOBUF\\_FIELD\\_TYPE\\_MISMATCH \n[SQLSTATE: 42K0G](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nType mismatch encountered for field: ``. \n### PROTOBUF\\_JAVA\\_CLASSES\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nJava classes are not supported for ``. Contact Databricks Support about alternate options. \n### PROTOBUF\\_MESSAGE\\_NOT\\_FOUND \n[SQLSTATE: 42K0G](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to locate Message `` in Descriptor. \n### PROTOBUF\\_TYPE\\_NOT\\_SUPPORT \n[SQLSTATE: 42K0G](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nProtobuf type not yet supported: ``. \n### PS\\_FETCH\\_RETRY\\_EXCEPTION \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nTask in pubsub fetch stage cannot be retried. Partition `` in stage ``, TID ``. \n### PS\\_INVALID\\_EMPTY\\_OPTION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` cannot be an empty string. \n### PS\\_INVALID\\_KEY\\_TYPE \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nInvalid key type for PubSub dedup: ``. \n### PS\\_INVALID\\_OPTION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe option `` is not supported by PubSub. It can only be used in testing. \n### PS\\_INVALID\\_OPTION\\_TYPE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid type for ``. Expected type of `` to be type ``. \n### PS\\_INVALID\\_READ\\_LIMIT \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid read limit on PubSub stream: ``. \n### PS\\_INVALID\\_UNSAFE\\_ROW\\_CONVERSION\\_FROM\\_PROTO \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nInvalid UnsafeRow to decode to PubSubMessageMetadata, the desired proto schema is: ``. The input UnsafeRow might be corrupted: ``. \n### PS\\_MISSING\\_AUTH\\_INFO \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to find complete PubSub authentication information. \n### PS\\_MISSING\\_REQUIRED\\_OPTION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCould not find required option: ``. \n### PS\\_MOVING\\_CHECKPOINT\\_FAILURE \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFail to move raw data checkpoint files from `` to destination directory: ``. \n### PS\\_MULTIPLE\\_FAILED\\_EPOCHS \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nPubSub stream cannot be started as there is more than one failed fetch: ``. \n### PS\\_OPTION\\_NOT\\_IN\\_BOUNDS \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n`` must be within the following bounds (``, ``) exclusive of both bounds. \n### PS\\_PROVIDE\\_CREDENTIALS\\_WITH\\_OPTION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nShared clusters do not support authentication with instance profiles. Provide credentials to the stream directly using .option(). \n### PS\\_SPARK\\_SPECULATION\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nPubSub source connector is only available in cluster with `spark.speculation` disabled. \n### PS\\_UNABLE\\_TO\\_CREATE\\_SUBSCRIPTION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nAn error occurred while trying to create subscription `` on topic ``. Please check that there are sufficient permissions to create a subscription and try again. \n### PS\\_UNABLE\\_TO\\_PARSE\\_PROTO \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nUnable to parse serialized bytes to generate proto. \n### PS\\_UNSUPPORTED\\_GET\\_OFFSET\\_CALL \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \ngetOffset is not supported without supplying a limit. \n### PYTHON\\_DATA\\_SOURCE\\_ERROR \n[SQLSTATE: 38000](https://docs.databricks.com/error-messages/sqlstates.html#class-38-external-routine-exception) \nFailed to `` Python data source ``: `` \n### [QUERIED\\_TABLE\\_INCOMPATIBLE\\_WITH\\_COLUMN\\_MASK\\_POLICY](https://docs.databricks.com/error-messages/queried-table-incompatible-with-column-mask-policy-error-class.html) \n[SQLSTATE: 428HD](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to access referenced table because a previously assigned column mask is currently incompatible with the table schema; to continue, please contact the owner of the table to update the policy: \nFor more details see [QUERIED\\_TABLE\\_INCOMPATIBLE\\_WITH\\_COLUMN\\_MASK\\_POLICY](https://docs.databricks.com/error-messages/queried-table-incompatible-with-column-mask-policy-error-class.html) \n### [QUERIED\\_TABLE\\_INCOMPATIBLE\\_WITH\\_ROW\\_LEVEL\\_SECURITY\\_POLICY](https://docs.databricks.com/error-messages/queried-table-incompatible-with-row-level-security-policy-error-class.html) \n[SQLSTATE: 428HD](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to access referenced table because a previously assigned row level security policy is currently incompatible with the table schema; to continue, please contact the owner of the table to update the policy: \nFor more details see [QUERIED\\_TABLE\\_INCOMPATIBLE\\_WITH\\_ROW\\_LEVEL\\_SECURITY\\_POLICY](https://docs.databricks.com/error-messages/queried-table-incompatible-with-row-level-security-policy-error-class.html) \n### READ\\_FILES\\_AMBIGUOUS\\_ROUTINE\\_PARAMETERS \n[SQLSTATE: 4274K](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe invocation of function `` has `` and `` set, which are aliases of each other. Please set only one of them. \n### READ\\_TVF\\_UNEXPECTED\\_REQUIRED\\_PARAMETER \n[SQLSTATE: 4274K](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe function `` required parameter `` must be assigned at position `` without the name. \n### RECURSIVE\\_PROTOBUF\\_SCHEMA \n[SQLSTATE: 42K0G](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFound recursive reference in Protobuf schema, which can not be processed by Spark by default: ``. try setting the option `recursive.fields.max.depth` 0 to 10. Going beyond 10 levels of recursion is not allowed. \n### RECURSIVE\\_VIEW \n[SQLSTATE: 42K0H](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nRecursive view `` detected (cycle: ``). \n### REF\\_DEFAULT\\_VALUE\\_IS\\_NOT\\_ALLOWED\\_IN\\_PARTITION \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nReferences to DEFAULT column values are not allowed within the PARTITION clause. \n### RELATION\\_LARGER\\_THAN\\_8G \n[SQLSTATE: 54000](https://docs.databricks.com/error-messages/sqlstates.html#class-54-program-limit-exceeded) \nCan not build a `` that is larger than 8G. \n### REMOTE\\_FUNCTION\\_HTTP\\_FAILED\\_ERROR \n[SQLSTATE: 57012](https://docs.databricks.com/error-messages/sqlstates.html#class-57-operator-intervention) \nThe remote HTTP request failed with code ``, and error message `` \n### REMOTE\\_FUNCTION\\_HTTP\\_RESULT\\_PARSE\\_ERROR \n[SQLSTATE: 22032](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to evaluate the `` SQL function due to inability to parse the JSON result from the remote HTTP response; the error message is ``. Check API documentation: ``. Please fix the problem indicated in the error message and retry the query again. \n### REMOTE\\_FUNCTION\\_HTTP\\_RESULT\\_UNEXPECTED\\_ERROR \n[SQLSTATE: 57012](https://docs.databricks.com/error-messages/sqlstates.html#class-57-operator-intervention) \nFailed to evaluate the `` SQL function due to inability to process the unexpected remote HTTP response; the error message is ``. Check API documentation: ``. Please fix the problem indicated in the error message and retry the query again. \n### REMOTE\\_FUNCTION\\_HTTP\\_RETRY\\_TIMEOUT \n[SQLSTATE: 57012](https://docs.databricks.com/error-messages/sqlstates.html#class-57-operator-intervention) \nThe remote request failed after retrying `` times; the last failed HTTP error code was `` and the message was `` \n### REMOTE\\_FUNCTION\\_MISSING\\_REQUIREMENTS\\_ERROR \n[SQLSTATE: 57012](https://docs.databricks.com/error-messages/sqlstates.html#class-57-operator-intervention) \nFailed to evaluate the `` SQL function because ``. Check requirements in ``. Please fix the problem indicated in the error message and retry the query again. \n### RENAME\\_SRC\\_PATH\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to rename as `` was not found. \n### REPEATED\\_CLAUSE \n[SQLSTATE: 42614](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe `` clause may be used at most once per `` operation. \n### REQUIRED\\_PARAMETER\\_NOT\\_FOUND \n[SQLSTATE: 4274K](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot invoke function `` because the parameter named `` is required, but the function call did not supply a value. Please update the function call to supply an argument value (either positionally at index `` or by name) and retry the query again. \n### REQUIRES\\_SINGLE\\_PART\\_NAMESPACE \n[SQLSTATE: 42K05](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` requires a single-part namespace, but got ``. \n### RESERVED\\_CDC\\_COLUMNS\\_ON\\_WRITE \n[SQLSTATE: 42939](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe write contains reserved columns `` that are used \ninternally as metadata for Change Data Feed. To write to the table either rename/drop \nthese columns or disable Change Data Feed on the table by setting \n`` to false. \n### [RESTRICTED\\_STREAMING\\_OPTION\\_PERMISSION\\_ENFORCED](https://docs.databricks.com/error-messages/restricted-streaming-option-permission-enforced-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe option `
`. \n### [UNSUPPORTED\\_ADD\\_FILE](https://docs.databricks.com/error-messages/unsupported-add-file-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDon\u2019t support add file. \nFor more details see [UNSUPPORTED\\_ADD\\_FILE](https://docs.databricks.com/error-messages/unsupported-add-file-error-class.html) \n### UNSUPPORTED\\_ARROWTYPE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupported arrow type ``. \n### UNSUPPORTED\\_BATCH\\_TABLE\\_VALUED\\_FUNCTION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe function `` does not support batch queries. \n### UNSUPPORTED\\_CHAR\\_OR\\_VARCHAR\\_AS\\_STRING \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe char/varchar type can\u2019t be used in the table schema. \nIf you want Spark treat them as string type as same as Spark 3.0 and earlier, please set \u201cspark.sql.legacy.charVarcharAsString\u201d to \u201ctrue\u201d. \n### UNSUPPORTED\\_CLAUSE\\_FOR\\_OPERATION \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe `` is not supported for ``. \n### UNSUPPORTED\\_COMMON\\_ANCESTOR\\_LOC\\_FOR\\_FILE\\_STREAM\\_SOURCE \n[SQLSTATE: 42616](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe common ancestor of source path and sourceArchiveDir should be registered with UC. \nIf you see this error message, it\u2019s likely that you register the source path and sourceArchiveDir in different external locations. \nPlease put them into a single external location. \n### UNSUPPORTED\\_CONSTRAINT\\_CLAUSES \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nConstraint clauses `` are unsupported. \n### UNSUPPORTED\\_CONSTRAINT\\_TYPE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnsupported constraint type. Only `` are supported \n### UNSUPPORTED\\_DATASOURCE\\_FOR\\_DIRECT\\_QUERY \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupported data source type for direct query on files: `` \n### UNSUPPORTED\\_DATATYPE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupported data type ``. \n### UNSUPPORTED\\_DATA\\_SOURCE\\_SAVE\\_MODE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe data source \u201c``\u201d cannot be written in the `` mode. Please use either the \u201cAppend\u201d or \u201cOverwrite\u201d mode instead. \n### UNSUPPORTED\\_DATA\\_TYPE\\_FOR\\_DATASOURCE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe `` datasource doesn\u2019t support the column `` of the type ``. \n### [UNSUPPORTED\\_DEFAULT\\_VALUE](https://docs.databricks.com/error-messages/unsupported-default-value-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDEFAULT column values is not supported. \nFor more details see [UNSUPPORTED\\_DEFAULT\\_VALUE](https://docs.databricks.com/error-messages/unsupported-default-value-error-class.html) \n### [UNSUPPORTED\\_DESERIALIZER](https://docs.databricks.com/error-messages/unsupported-deserializer-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe deserializer is not supported: \nFor more details see [UNSUPPORTED\\_DESERIALIZER](https://docs.databricks.com/error-messages/unsupported-deserializer-error-class.html) \n### UNSUPPORTED\\_EXPRESSION\\_GENERATED\\_COLUMN \n[SQLSTATE: 42621](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot create generated column `` with generation expression `` because ``. \n### UNSUPPORTED\\_EXPR\\_FOR\\_OPERATOR \n[SQLSTATE: 42K0E](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nA query operator contains one or more unsupported expressions. \nConsider to rewrite it to avoid window functions, aggregate functions, and generator functions in the WHERE clause. \nInvalid expressions: [``] \n### UNSUPPORTED\\_EXPR\\_FOR\\_PARAMETER \n[SQLSTATE: 42K0E](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nA query parameter contains unsupported expression. \nParameters can either be variables or literals. \nInvalid expression: [``] \n### UNSUPPORTED\\_EXPR\\_FOR\\_WINDOW \n[SQLSTATE: 42P20](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nExpression `` not supported within a window function. \n### [UNSUPPORTED\\_FEATURE](https://docs.databricks.com/error-messages/unsupported-feature-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe feature is not supported: \nFor more details see [UNSUPPORTED\\_FEATURE](https://docs.databricks.com/error-messages/unsupported-feature-error-class.html) \n### UNSUPPORTED\\_FN\\_TYPE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupported user defined function type: `` \n### [UNSUPPORTED\\_GENERATOR](https://docs.databricks.com/error-messages/unsupported-generator-error-class.html) \n[SQLSTATE: 42K0E](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe generator is not supported: \nFor more details see [UNSUPPORTED\\_GENERATOR](https://docs.databricks.com/error-messages/unsupported-generator-error-class.html) \n### UNSUPPORTED\\_GROUPING\\_EXPRESSION \n[SQLSTATE: 42K0E](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \ngrouping()/grouping\\_id() can only be used with GroupingSets/Cube/Rollup. \n### UNSUPPORTED\\_INITIAL\\_POSITION\\_AND\\_TRIGGER\\_PAIR\\_FOR\\_KINESIS\\_SOURCE \n[SQLSTATE: 42616](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` with initial position `` is not supported with the Kinesis source \n### [UNSUPPORTED\\_INSERT](https://docs.databricks.com/error-messages/unsupported-insert-error-class.html) \n[SQLSTATE: 42809](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCan\u2019t insert into the target. \nFor more details see [UNSUPPORTED\\_INSERT](https://docs.databricks.com/error-messages/unsupported-insert-error-class.html) \n### UNSUPPORTED\\_MANAGED\\_TABLE\\_CREATION \n[SQLSTATE: 0AKDD](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCreating a managed table `` using datasource `` is not supported. You need to use datasource DELTA or create an external table using CREATE EXTERNAL TABLE `` \u2026 USING `` \u2026 \n### [UNSUPPORTED\\_MERGE\\_CONDITION](https://docs.databricks.com/error-messages/unsupported-merge-condition-error-class.html) \n[SQLSTATE: 42K0E](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nMERGE operation contains unsupported `` condition. \nFor more details see [UNSUPPORTED\\_MERGE\\_CONDITION](https://docs.databricks.com/error-messages/unsupported-merge-condition-error-class.html) \n### UNSUPPORTED\\_NESTED\\_ROW\\_OR\\_COLUMN\\_ACCESS\\_POLICY \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nTable `` has a row level security policy or column mask which indirectly refers to another table with a row level security policy or column mask; this is not supported. Call sequence: `` \n### [UNSUPPORTED\\_OVERWRITE](https://docs.databricks.com/error-messages/unsupported-overwrite-error-class.html) \n[SQLSTATE: 42902](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCan\u2019t overwrite the target that is also being read from. \nFor more details see [UNSUPPORTED\\_OVERWRITE](https://docs.databricks.com/error-messages/unsupported-overwrite-error-class.html) \n### [UNSUPPORTED\\_SAVE\\_MODE](https://docs.databricks.com/error-messages/unsupported-save-mode-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe save mode `` is not supported for: \nFor more details see [UNSUPPORTED\\_SAVE\\_MODE](https://docs.databricks.com/error-messages/unsupported-save-mode-error-class.html) \n### [UNSUPPORTED\\_STREAMING\\_OPTIONS\\_FOR\\_VIEW](https://docs.databricks.com/error-messages/unsupported-streaming-options-for-view-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupported for streaming a view. Reason: \nFor more details see [UNSUPPORTED\\_STREAMING\\_OPTIONS\\_FOR\\_VIEW](https://docs.databricks.com/error-messages/unsupported-streaming-options-for-view-error-class.html) \n### UNSUPPORTED\\_STREAMING\\_OPTIONS\\_PERMISSION\\_ENFORCED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nStreaming options `` are not supported for data source `` on a shared cluster. \n### UNSUPPORTED\\_STREAMING\\_SINK\\_PERMISSION\\_ENFORCED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nData source `` is not supported as a streaming sink on a shared cluster. \n### UNSUPPORTED\\_STREAMING\\_SOURCE\\_PERMISSION\\_ENFORCED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nData source `` is not supported as a streaming source on a shared cluster. \n### UNSUPPORTED\\_STREAMING\\_TABLE\\_VALUED\\_FUNCTION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe function `` does not support streaming. Please remove the STREAM keyword \n### UNSUPPORTED\\_STREAM\\_READ\\_LIMIT\\_FOR\\_KINESIS\\_SOURCE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is not supported with the Kinesis source \n### [UNSUPPORTED\\_SUBQUERY\\_EXPRESSION\\_CATEGORY](https://docs.databricks.com/error-messages/unsupported-subquery-expression-category-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupported subquery expression: \nFor more details see [UNSUPPORTED\\_SUBQUERY\\_EXPRESSION\\_CATEGORY](https://docs.databricks.com/error-messages/unsupported-subquery-expression-category-error-class.html) \n### UNSUPPORTED\\_TIMESERIES\\_COLUMNS \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nCreating primary key with timeseries columns is not supported \n### UNSUPPORTED\\_TIMESERIES\\_WITH\\_MORE\\_THAN\\_ONE\\_COLUMN \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCreating primary key with more than one timeseries column `` is not supported \n### UNSUPPORTED\\_TRIGGER\\_FOR\\_KINESIS\\_SOURCE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is not supported with the Kinesis source \n### UNSUPPORTED\\_TYPED\\_LITERAL \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nLiterals of the type `` are not supported. Supported types are ``. \n### UNTYPED\\_SCALA\\_UDF \n[SQLSTATE: 42K0E](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nYou\u2019re using untyped Scala UDF, which does not have the input type information. \nSpark may blindly pass null to the Scala closure with primitive-type argument, and the closure will see the default value of the Java type for the null argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for null input. To get rid of this error, you could: \n1. use typed Scala UDF APIs(without return type parameter), e.g. `udf((x: Int) => x)`.\n2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { override def call(s: String): Integer = s.length() }, IntegerType)`, if input types are all non primitive.\n3. set \u201cspark.sql.legacy.allowUntypedScalaUDF\u201d to \u201ctrue\u201d and use this API with caution. \n### [UPGRADE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/upgrade-not-supported-error-class.html) \n[SQLSTATE: 0AKUC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nTable is not eligible for upgrade from Hive Metastore to Unity Catalog. Reason: \nFor more details see [UPGRADE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/upgrade-not-supported-error-class.html) \n### [USER\\_DEFINED\\_FUNCTIONS](https://docs.databricks.com/error-messages/user-defined-functions-error-class.html) \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUser defined function is invalid: \nFor more details see [USER\\_DEFINED\\_FUNCTIONS](https://docs.databricks.com/error-messages/user-defined-functions-error-class.html) \n### USER\\_RAISED\\_EXCEPTION \n[SQLSTATE: P0001](https://docs.databricks.com/error-messages/sqlstates.html#class-p0-procedural-logic-error) \n`` \n### USER\\_RAISED\\_EXCEPTION\\_PARAMETER\\_MISMATCH \n[SQLSTATE: P0001](https://docs.databricks.com/error-messages/sqlstates.html#class-p0-procedural-logic-error) \nThe `raise_error()` function was used to raise error class: `` which expects parameters: ``. \nThe provided parameters `` do not match the expected parameters. \nPlease make sure to provide all expected parameters. \n### USER\\_RAISED\\_EXCEPTION\\_UNKNOWN\\_ERROR\\_CLASS \n[SQLSTATE: P0001](https://docs.databricks.com/error-messages/sqlstates.html#class-p0-procedural-logic-error) \nThe `raise_error()` function was used to raise an unknown error class: `` \n### VARIABLE\\_ALREADY\\_EXISTS \n[SQLSTATE: 42723](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot create the variable `` because it already exists. \nChoose a different name, or drop or replace the existing variable. \n### VARIABLE\\_NOT\\_FOUND \n[SQLSTATE: 42883](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe variable `` cannot be found. Verify the spelling and correctness of the schema and catalog. \nIf you did not qualify the name with a schema and catalog, verify the current\\_schema() output, or qualify the name with the correct schema and catalog. \nTo tolerate the error on drop use DROP VARIABLE IF EXISTS. \n### VARIANT\\_SIZE\\_LIMIT \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot build variant bigger than `` in ``. \nPlease avoid large input strings to this expression (for example, add function calls(s) to check the expression size and convert it to NULL first if it is too big). \n### VIEW\\_ALREADY\\_EXISTS \n[SQLSTATE: 42P07](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot create view `` because it already exists. \nChoose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects. \n### VIEW\\_EXCEED\\_MAX\\_NESTED\\_DEPTH \n[SQLSTATE: 54K00](https://docs.databricks.com/error-messages/sqlstates.html#class-54-program-limit-exceeded) \nThe depth of view `` exceeds the maximum view resolution depth (``). \nAnalysis is aborted to avoid errors. If you want to work around this, please try to increase the value of \u201cspark.sql.view.maxNestedViewDepth\u201d. \n### VIEW\\_NOT\\_FOUND \n[SQLSTATE: 42P01](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe view `` cannot be found. Verify the spelling and correctness of the schema and catalog. \nIf you did not qualify the name with a schema, verify the current\\_schema() output, or qualify the name with the correct schema and catalog. \nTo tolerate the error on drop use DROP VIEW IF EXISTS. \n### VOLUME\\_ALREADY\\_EXISTS \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot create volume `` because it already exists. \nChoose a different name, drop or replace the existing object, or add the IF NOT EXISTS clause to tolerate pre-existing objects. \n### WINDOW\\_FUNCTION\\_AND\\_FRAME\\_MISMATCH \n[SQLSTATE: 42K0E](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` function can only be evaluated in an ordered row-based window frame with a single offset: ``. \n### WINDOW\\_FUNCTION\\_WITHOUT\\_OVER\\_CLAUSE \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nWindow function `` requires an OVER clause. \n### WITH\\_CREDENTIAL \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nWITH CREDENTIAL syntax is not supported for ``. \n### WRITE\\_STREAM\\_NOT\\_ALLOWED \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`writeStream` can be called only on streaming Dataset/DataFrame. \n### WRONG\\_COLUMN\\_DEFAULTS\\_FOR\\_DELTA\\_ALTER\\_TABLE\\_ADD\\_COLUMN\\_NOT\\_SUPPORTED \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nFailed to execute the command because DEFAULT values are not supported when adding new \ncolumns to previously existing Delta tables; please add the column without a default \nvalue first, then run a second ALTER TABLE ALTER COLUMN SET DEFAULT command to apply \nfor future inserted rows instead. \n### WRONG\\_COLUMN\\_DEFAULTS\\_FOR\\_DELTA\\_FEATURE\\_NOT\\_ENABLED \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nFailed to execute `` command because it assigned a column DEFAULT value, \nbut the corresponding table feature was not enabled. Please retry the command again \nafter executing ALTER TABLE tableName SET \nTBLPROPERTIES(\u2018delta.feature.allowColumnDefaults\u2019 = \u2018supported\u2019). \n### WRONG\\_COMMAND\\_FOR\\_OBJECT\\_TYPE \n[SQLSTATE: 42809](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe operation `` requires a ``. But `` is a ``. Use `` instead. \n### [WRONG\\_NUM\\_ARGS](https://docs.databricks.com/error-messages/wrong-num-args-error-class.html) \n[SQLSTATE: 42605](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe `` requires `` parameters but the actual number is ``. \nFor more details see [WRONG\\_NUM\\_ARGS](https://docs.databricks.com/error-messages/wrong-num-args-error-class.html) \n### XML\\_ROW\\_TAG\\_MISSING \n[SQLSTATE: 42KDF](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` option is required for reading files in XML format. \n### XML\\_UNSUPPORTED\\_NESTED\\_TYPES \n[SQLSTATE: 0N000](https://docs.databricks.com/error-messages/sqlstates.html#class-0n-sqlxml-mapping-error) \nXML doesn\u2019t support `` as inner type of ``. Please wrap the `` within a StructType field when using it inside ``. \n### XML\\_WILDCARD\\_RESCUED\\_DATA\\_CONFLICT\\_ERROR \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nRescued data and wildcard column cannot be simultaneously enabled. Remove the wildcardColumnName option. \n### ZORDERBY\\_COLUMN\\_DOES\\_NOT\\_EXIST \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nZOrderBy column `` doesn\u2019t exist.\n\n", "chunk_id": "f4c4be25b386b73a276b0d6898543863", "url": "https://docs.databricks.com/error-messages/error-classes.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### Error classes in Databricks\n##### Delta Lake\n\n### DELTA\\_ACTIVE\\_SPARK\\_SESSION\\_NOT\\_FOUND \n[SQLSTATE: 08003](https://docs.databricks.com/error-messages/sqlstates.html#class-08-connection-exception) \nCould not find active SparkSession \n### DELTA\\_ACTIVE\\_TRANSACTION\\_ALREADY\\_SET \n[SQLSTATE: 0B000](https://docs.databricks.com/error-messages/sqlstates.html#class-0b-invalid-transaction-initiation) \nCannot set a new txn as active when one is already active \n### DELTA\\_ADDING\\_COLUMN\\_WITH\\_INTERNAL\\_NAME\\_FAILED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to add column `` because the name is reserved. \n### DELTA\\_ADDING\\_DELETION\\_VECTORS\\_DISALLOWED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe current operation attempted to add a deletion vector to a table that does not permit the creation of new deletion vectors. Please file a bug report. \n### DELTA\\_ADDING\\_DELETION\\_VECTORS\\_WITH\\_TIGHT\\_BOUNDS\\_DISALLOWED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nAll operations that add deletion vectors should set the tightBounds column in statistics to false. Please file a bug report. \n### DELTA\\_ADD\\_COLUMN\\_AT\\_INDEX\\_LESS\\_THAN\\_ZERO \n[SQLSTATE: 42KD3](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nIndex `` to add column `` is lower than 0 \n### DELTA\\_ADD\\_COLUMN\\_PARENT\\_NOT\\_STRUCT \n[SQLSTATE: 42KD3](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot add `` because its parent is not a StructType. Found `` \n### DELTA\\_ADD\\_COLUMN\\_STRUCT\\_NOT\\_FOUND \n[SQLSTATE: 42KD3](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nStruct not found at position `` \n### DELTA\\_ADD\\_CONSTRAINTS \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nPlease use ALTER TABLE ADD CONSTRAINT to add CHECK constraints. \n### DELTA\\_AGGREGATE\\_IN\\_GENERATED\\_COLUMN \n[SQLSTATE: 42621](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFound ``. A generated column cannot use an aggregate expression \n### DELTA\\_AGGREGATION\\_NOT\\_SUPPORTED \n[SQLSTATE: 42903](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nAggregate functions are not supported in the `` ``. \n### DELTA\\_ALTER\\_TABLE\\_CHANGE\\_COL\\_NOT\\_SUPPORTED \n[SQLSTATE: 42837](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nALTER TABLE CHANGE COLUMN is not supported for changing column `` to `` \n### DELTA\\_ALTER\\_TABLE\\_CLUSTER\\_BY\\_NOT\\_ALLOWED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nALTER TABLE CLUSTER BY is supported only for Delta table with Liquid clustering. \n### DELTA\\_ALTER\\_TABLE\\_CLUSTER\\_BY\\_ON\\_PARTITIONED\\_TABLE\\_NOT\\_ALLOWED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nALTER TABLE CLUSTER BY cannot be applied to a partitioned table. \n### DELTA\\_ALTER\\_TABLE\\_RENAME\\_NOT\\_ALLOWED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nOperation not allowed: ALTER TABLE RENAME TO is not allowed for managed Delta tables on S3, as eventual consistency on S3 may corrupt the Delta transaction log. If you insist on doing so and are sure that there has never been a Delta table with the new name `` before, you can enable this by setting `` to be true. \n### DELTA\\_ALTER\\_TABLE\\_SET\\_CLUSTERING\\_TABLE\\_FEATURE\\_NOT\\_ALLOWED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot enable `` table feature using ALTER TABLE SET TBLPROPERTIES. Please use CREATE OR REPLACE TABLE CLUSTER BY to create a Delta table with clustering. \n### DELTA\\_AMBIGUOUS\\_DATA\\_TYPE\\_CHANGE \n[SQLSTATE: 429BQ](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot change data type of `` from `` to ``. This change contains column removals and additions, therefore they are ambiguous. Please make these changes individually using ALTER TABLE [ADD | DROP | RENAME] COLUMN. \n### DELTA\\_AMBIGUOUS\\_PARTITION\\_COLUMN \n[SQLSTATE: 42702](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nAmbiguous partition column `` can be ``. \n### DELTA\\_AMBIGUOUS\\_PATHS\\_IN\\_CREATE\\_TABLE \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCREATE TABLE contains two different locations: `` and ``. \nYou can remove the LOCATION clause from the CREATE TABLE statement, or set \n`` to true to skip this check. \n### DELTA\\_ARCHIVED\\_FILES\\_IN\\_LIMIT \n[SQLSTATE: 42KDC](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nTable `
` does not contain enough records in non-archived files to satisfy specified LIMIT of `` records. \n### DELTA\\_ARCHIVED\\_FILES\\_IN\\_SCAN \n[SQLSTATE: 42KDC](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFound `` potentially archived file(s) in table `
` that need to be scanned as part of this query. \nArchived files cannot be accessed. The current time until archival is configured as ``. \nPlease adjust your query filters to exclude any archived files. \n### DELTA\\_BLOCK\\_COLUMN\\_MAPPING\\_AND\\_CDC\\_OPERATION \n[SQLSTATE: 42KD4](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nOperation \u201c``\u201d is not allowed when the table has enabled change data feed (CDF) and has undergone schema changes using DROP COLUMN or RENAME COLUMN. \n### DELTA\\_BLOOM\\_FILTER\\_DROP\\_ON\\_NON\\_EXISTING\\_COLUMNS \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot drop bloom filter indices for the following non-existent column(s): `` \n### DELTA\\_BLOOM\\_FILTER\\_OOM\\_ON\\_WRITE \n[SQLSTATE: 82100](https://docs.databricks.com/error-messages/sqlstates.html#class-82-out-of-memory) \nOutOfMemoryError occurred while writing bloom filter indices for the following column(s): ``. \nYou can reduce the memory footprint of bloom filter indices by choosing a smaller value for the \u2018numItems\u2019 option, a larger value for the \u2018fpp\u2019 option, or by indexing fewer columns. \n### DELTA\\_CANNOT\\_CHANGE\\_DATA\\_TYPE \n[SQLSTATE: 429BQ](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot change data type: `` \n### DELTA\\_CANNOT\\_CHANGE\\_LOCATION \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot change the \u2018location\u2019 of the Delta table using SET TBLPROPERTIES. Please use ALTER TABLE SET LOCATION instead. \n### DELTA\\_CANNOT\\_CHANGE\\_PROVIDER \n[SQLSTATE: 42939](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n\u2018provider\u2019 is a reserved table property, and cannot be altered. \n### DELTA\\_CANNOT\\_CREATE\\_BLOOM\\_FILTER\\_NON\\_EXISTING\\_COL \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot create bloom filter indices for the following non-existent column(s): `` \n### DELTA\\_CANNOT\\_CREATE\\_LOG\\_PATH \n[SQLSTATE: 42KD5](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot create `` \n### DELTA\\_CANNOT\\_DESCRIBE\\_VIEW\\_HISTORY \n[SQLSTATE: 42809](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot describe the history of a view. \n### DELTA\\_CANNOT\\_DROP\\_BLOOM\\_FILTER\\_ON\\_NON\\_INDEXED\\_COLUMN \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot drop bloom filter index on a non indexed column: `` \n### DELTA\\_CANNOT\\_EVALUATE\\_EXPRESSION \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot evaluate expression: `` \n### DELTA\\_CANNOT\\_FIND\\_BUCKET\\_SPEC \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nExpecting a bucketing Delta table but cannot find the bucket spec in the table \n### DELTA\\_CANNOT\\_GENERATE\\_CODE\\_FOR\\_EXPRESSION \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot generate code for expression: `` \n### DELTA\\_CANNOT\\_MODIFY\\_APPEND\\_ONLY \n[SQLSTATE: 42809](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThis table is configured to only allow appends. If you would like to permit updates or deletes, use \u2018ALTER TABLE SET TBLPROPERTIES (``=false)\u2019. \n### DELTA\\_CANNOT\\_MODIFY\\_TABLE\\_PROPERTY \n[SQLSTATE: 42939](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe Delta table configuration `` cannot be specified by the user \n### DELTA\\_CANNOT\\_RECONSTRUCT\\_PATH\\_FROM\\_URI \n[SQLSTATE: 22KD1](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nA uri (``) which can\u2019t be turned into a relative path was found in the transaction log. \n### DELTA\\_CANNOT\\_RELATIVIZE\\_PATH \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nA path (``) which can\u2019t be relativized with the current input found in the \ntransaction log. Please re-run this as: \n%%scala com.databricks.delta.Delta.fixAbsolutePathsInLog(\u201c``\u201d, true) \nand then also run: \n%%scala com.databricks.delta.Delta.fixAbsolutePathsInLog(\u201c``\u201d) \n### DELTA\\_CANNOT\\_RENAME\\_PATH \n[SQLSTATE: 22KD1](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot rename `` to `` \n### DELTA\\_CANNOT\\_REPLACE\\_MISSING\\_TABLE \n[SQLSTATE: 42P01](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nTable `` cannot be replaced as it does not exist. Use CREATE OR REPLACE TABLE to create the table. \n### DELTA\\_CANNOT\\_RESOLVE\\_COLUMN \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCan\u2019t resolve column `` in `` \n### DELTA\\_CANNOT\\_RESTORE\\_TABLE\\_VERSION \n[SQLSTATE: 22003](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot restore table to version ``. Available versions: [``, ``]. \n### DELTA\\_CANNOT\\_RESTORE\\_TIMESTAMP\\_GREATER \n[SQLSTATE: 22003](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot restore table to timestamp (``) as it is after the latest version available. Please use a timestamp before (``) \n### DELTA\\_CANNOT\\_SET\\_LOCATION\\_ON\\_PATH\\_IDENTIFIER \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot change the location of a path based table. \n### DELTA\\_CANNOT\\_UPDATE\\_ARRAY\\_FIELD \n[SQLSTATE: 429BQ](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot update %1$s field %2$s type: update the element by updating %2$s.element \n### DELTA\\_CANNOT\\_UPDATE\\_MAP\\_FIELD \n[SQLSTATE: 429BQ](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot update %1$s field %2$s type: update a map by updating %2$s.key or %2$s.value \n### DELTA\\_CANNOT\\_UPDATE\\_OTHER\\_FIELD \n[SQLSTATE: 429BQ](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot update `` field of type `` \n### DELTA\\_CANNOT\\_UPDATE\\_STRUCT\\_FIELD \n[SQLSTATE: 429BQ](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot update `` field `` type: update struct by adding, deleting, or updating its fields \n### DELTA\\_CANNOT\\_USE\\_ALL\\_COLUMNS\\_FOR\\_PARTITION \n[SQLSTATE: 428FT](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot use all columns for partition columns \n### DELTA\\_CANNOT\\_WRITE\\_INTO\\_VIEW \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`
` is a view. Writes to a view are not supported. \n### DELTA\\_CAST\\_OVERFLOW\\_IN\\_TABLE\\_WRITE \n[SQLSTATE: 22003](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to write a value of `` type into the `` type column `` due to an overflow. \nUse `try_cast` on the input value to tolerate overflow and return NULL instead. \nIf necessary, set `` to \u201cLEGACY\u201d to bypass this error or set `` to true to revert to the old behaviour and follow `` in UPDATE and MERGE. \n### DELTA\\_CDC\\_NOT\\_ALLOWED\\_IN\\_THIS\\_VERSION \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nConfiguration delta.enableChangeDataFeed cannot be set. Change data feed from Delta is not yet available. \n### DELTA\\_CHANGE\\_DATA\\_FEED\\_INCOMPATIBLE\\_DATA\\_SCHEMA \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nRetrieving table changes between version `` and `` failed because of an incompatible data schema. \nYour read schema is `` at version ``, but we found an incompatible data schema at version ``. \nIf possible, please retrieve the table changes using the end version\u2019s schema by setting `` to `endVersion`, or contact support. \n### DELTA\\_CHANGE\\_DATA\\_FEED\\_INCOMPATIBLE\\_SCHEMA\\_CHANGE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nRetrieving table changes between version `` and `` failed because of an incompatible schema change. \nYour read schema is `` at version ``, but we found an incompatible schema change at version ``. \nIf possible, please query table changes separately from version `` to `` - 1, and from version `` to ``. \n### DELTA\\_CHANGE\\_DATA\\_FILE\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFile `` referenced in the transaction log cannot be found. This can occur when data has been manually deleted from the file system rather than using the table `DELETE` statement. This request appears to be targeting Change Data Feed, if that is the case, this error can occur when the change data file is out of the retention period and has been deleted by the `VACUUM` statement. For more information, see `` \n### DELTA\\_CHANGE\\_TABLE\\_FEED\\_DISABLED \n[SQLSTATE: 42807](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot write to table with delta.enableChangeDataFeed set. Change data feed from Delta is not available. \n### DELTA\\_CHECKPOINT\\_NON\\_EXIST\\_TABLE \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot checkpoint a non-existing table ``. Did you manually delete files in the *delta*log directory? \n### DELTA\\_CLONE\\_AMBIGUOUS\\_TARGET \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nTwo paths were provided as the CLONE target so it is ambiguous which to use. An external \nlocation for CLONE was provided at `` at the same time as the path \n``. \n### DELTA\\_CLONE\\_INCOMPLETE\\_FILE\\_COPY \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFile (``) not copied completely. Expected file size: ``, found: ``. To continue with the operation by ignoring the file size check set `` to false. \n### DELTA\\_CLONE\\_UNSUPPORTED\\_SOURCE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupported `` clone source \u2018``\u2019, whose format is ``. \nThe supported formats are \u2018delta\u2019, \u2018iceberg\u2019 and \u2018parquet\u2019. \n### DELTA\\_CLUSTERING\\_CLONE\\_TABLE\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCLONE is not supported for Delta table with Liquid clustering for DBR version < 14.0. \n### DELTA\\_CLUSTERING\\_COLUMNS\\_MISMATCH \n[SQLSTATE: 42P10](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe provided clustering columns do not match the existing table\u2019s. \n* provided: ``\n* existing: `` \n### DELTA\\_CLUSTERING\\_COLUMN\\_MISSING\\_STATS \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nLiquid clustering requires clustering columns to have stats. Couldn\u2019t find clustering column(s) \u2018``\u2019 in stats schema: \n`` \n### DELTA\\_CLUSTERING\\_CREATE\\_EXTERNAL\\_NON\\_LIQUID\\_TABLE\\_FROM\\_LIQUID\\_TABLE \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCreating an external table without liquid clustering from a table directory with liquid clustering is not allowed; path: ``. \n### DELTA\\_CLUSTERING\\_NOT\\_SUPPORTED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n\u2018``\u2019 does not support clustering. \n### DELTA\\_CLUSTERING\\_PHASE\\_OUT\\_FAILED \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot finish the `` of the table with `` table feature (reason: ``). Please try the OPTIMIZE command again. \n== Error == \n`` \n### DELTA\\_CLUSTERING\\_REPLACE\\_TABLE\\_WITH\\_PARTITIONED\\_TABLE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nREPLACE a Delta table with Liquid clustering with a partitioned table is not allowed. \n### DELTA\\_CLUSTERING\\_SHOW\\_CREATE\\_TABLE\\_WITHOUT\\_CLUSTERING\\_COLUMNS \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nSHOW CREATE TABLE is not supported for Delta table with Liquid clustering without any clustering columns. \n### DELTA\\_CLUSTERING\\_WITH\\_DYNAMIC\\_PARTITION\\_OVERWRITE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nDynamic partition overwrite mode is not allowed for Delta table with Liquid clustering. \n### DELTA\\_CLUSTERING\\_WITH\\_PARTITION\\_PREDICATE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nOPTIMIZE command for Delta table with Liquid clustering doesn\u2019t support partition predicates. Please remove the predicates: ``. \n### DELTA\\_CLUSTERING\\_WITH\\_ZORDER\\_BY \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nOPTIMIZE command for Delta table with Liquid clustering cannot specify ZORDER BY. Please remove ZORDER BY (``). \n### DELTA\\_CLUSTER\\_BY\\_INVALID\\_NUM\\_COLUMNS \n[SQLSTATE: 54000](https://docs.databricks.com/error-messages/sqlstates.html#class-54-program-limit-exceeded) \nCLUSTER BY for Liquid clustering supports up to `` clustering columns, but the table has `` clustering columns. Please remove the extra clustering columns. \n### DELTA\\_CLUSTER\\_BY\\_SCHEMA\\_NOT\\_PROVIDED \n[SQLSTATE: 42908](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nIt is not allowed to specify CLUSTER BY when the schema is not defined. Please define schema for table ``. \n### DELTA\\_CLUSTER\\_BY\\_WITH\\_BUCKETING \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nClustering and bucketing cannot both be specified. Please remove CLUSTERED BY INTO BUCKETS / bucketBy if you want to create a Delta table with clustering. \n### DELTA\\_CLUSTER\\_BY\\_WITH\\_PARTITIONED\\_BY \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nClustering and partitioning cannot both be specified. Please remove PARTITIONED BY / partitionBy / partitionedBy if you want to create a Delta table with clustering. \n### DELTA\\_COLUMN\\_DATA\\_SKIPPING\\_NOT\\_SUPPORTED\\_PARTITIONED\\_COLUMN \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nData skipping is not supported for partition column \u2018``\u2019. \n### DELTA\\_COLUMN\\_DATA\\_SKIPPING\\_NOT\\_SUPPORTED\\_TYPE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nData skipping is not supported for column \u2018``\u2019 of type ``. \n### DELTA\\_COLUMN\\_MAPPING\\_MAX\\_COLUMN\\_ID\\_NOT\\_SET \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe max column id property (``) is not set on a column mapping enabled table. \n### DELTA\\_COLUMN\\_MAPPING\\_MAX\\_COLUMN\\_ID\\_NOT\\_SET\\_CORRECTLY \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe max column id property (``) on a column mapping enabled table is ``, which cannot be smaller than the max column id for all fields (``). \n### DELTA\\_COLUMN\\_NOT\\_FOUND \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to find the column `` given [``] \n### DELTA\\_COLUMN\\_NOT\\_FOUND\\_IN\\_MERGE \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to find the column \u2018``\u2019 of the target table from the INSERT columns: ``. INSERT clause must specify value for all the columns of the target table. \n### DELTA\\_COLUMN\\_NOT\\_FOUND\\_IN\\_SCHEMA \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCouldn\u2019t find column `` in: \n`` \n### DELTA\\_COLUMN\\_PATH\\_NOT\\_NESTED \n[SQLSTATE: 42704](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nExpected `` to be a nested data type, but found ``. Was looking for the \nindex of `` in a nested field \n### DELTA\\_COLUMN\\_STRUCT\\_TYPE\\_MISMATCH \n[SQLSTATE: 2200G](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nStruct column `` cannot be inserted into a `` field `` in ``. \n### DELTA\\_COMPACTION\\_VALIDATION\\_FAILED \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe validation of the compaction of path `` to `` failed: Please file a bug report. \n### DELTA\\_COMPLEX\\_TYPE\\_COLUMN\\_CONTAINS\\_NULL\\_TYPE \n[SQLSTATE: 22005](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFound nested NullType in column `` which is of ``. Delta doesn\u2019t support writing NullType in complex types. \n### DELTA\\_CONFLICT\\_SET\\_COLUMN \n[SQLSTATE: 42701](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThere is a conflict from these SET columns: ``. \n### DELTA\\_CONSTRAINT\\_ALREADY\\_EXISTS \n[SQLSTATE: 42710](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nConstraint \u2018``\u2019 already exists. Please delete the old constraint first. \nOld constraint: \n`` \n### DELTA\\_CONSTRAINT\\_DOES\\_NOT\\_EXIST \n[SQLSTATE: 42704](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot drop nonexistent constraint `` from table ``. To avoid throwing an error, provide the parameter IF EXISTS or set the SQL session configuration `` to ``. \n### DELTA\\_CONVERSION\\_NO\\_PARTITION\\_FOUND \n[SQLSTATE: 42KD6](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFound no partition information in the catalog for table ``. Have you run \u201cMSCK REPAIR TABLE\u201d on your table to discover partitions? \n### DELTA\\_CONVERSION\\_UNSUPPORTED\\_COLUMN\\_MAPPING \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe configuration \u2018``\u2019 cannot be set to `` when using CONVERT TO DELTA. \n### DELTA\\_CONVERT\\_NON\\_PARQUET\\_TABLE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCONVERT TO DELTA only supports parquet tables, but you are trying to convert a `` source: `` \n### DELTA\\_CONVERT\\_TO\\_DELTA\\_ROW\\_TRACKING\\_WITHOUT\\_STATS \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot enable row tracking without collecting statistics. \nIf you want to enable row tracking, do the following: \n1. Enable statistics collection by running the command \nSET `` = true \n2. Run CONVERT TO DELTA without the NO STATISTICS option. \nIf you do not want to collect statistics, disable row tracking: \n1. Deactivate enabling the table feature by default by running the command: \nRESET `` \n2. Deactivate the table property by default by running: \nSET `` = false \n### DELTA\\_COPY\\_INTO\\_TARGET\\_FORMAT \n[SQLSTATE: 0AKDD](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCOPY INTO target must be a Delta table. \n### DELTA\\_CREATE\\_EXTERNAL\\_TABLE\\_WITHOUT\\_SCHEMA \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nYou are trying to create an external table `` \nfrom `` using Delta, but the schema is not specified when the \ninput path is empty. \nTo learn more about Delta, see `` \n### DELTA\\_CREATE\\_EXTERNAL\\_TABLE\\_WITHOUT\\_TXN\\_LOG \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nYou are trying to create an external table `` \nfrom `%2$s` using Delta, but there is no transaction log present at \n`%2$s/_delta_log`. Check the upstream job to make sure that it is writing using \nformat(\u201cdelta\u201d) and that the path is the root of the table. \nTo learn more about Delta, see `` \n### DELTA\\_CREATE\\_TABLE\\_SCHEME\\_MISMATCH \n[SQLSTATE: 42KD7](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe specified schema does not match the existing schema at ``. \n== Specified == \n`` \n== Existing == \n`` \n== Differences == \n`` \nIf your intention is to keep the existing schema, you can omit the \nschema from the create table command. Otherwise please ensure that \nthe schema matches. \n### DELTA\\_CREATE\\_TABLE\\_SET\\_CLUSTERING\\_TABLE\\_FEATURE\\_NOT\\_ALLOWED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot enable `` table feature using TBLPROPERTIES. Please use CREATE OR REPLACE TABLE CLUSTER BY to create a Delta table with clustering. \n### DELTA\\_CREATE\\_TABLE\\_WITH\\_DIFFERENT\\_CLUSTERING \n[SQLSTATE: 42KD7](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe specified clustering columns do not match the existing clustering columns at ``. \n== Specified == \n`` \n== Existing == \n`` \n### DELTA\\_CREATE\\_TABLE\\_WITH\\_DIFFERENT\\_PARTITIONING \n[SQLSTATE: 42KD7](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe specified partitioning does not match the existing partitioning at ``. \n== Specified == \n`` \n== Existing == \n`` \n### DELTA\\_CREATE\\_TABLE\\_WITH\\_DIFFERENT\\_PROPERTY \n[SQLSTATE: 42KD7](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe specified properties do not match the existing properties at ``. \n== Specified == \n`` \n== Existing == \n`` \n### DELTA\\_CREATE\\_TABLE\\_WITH\\_NON\\_EMPTY\\_LOCATION \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot create table (\u2018``\u2019). The associated location (\u2018``\u2019) is not empty and also not a Delta table. \n### DELTA\\_DATA\\_CHANGE\\_FALSE \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot change table metadata because the \u2018dataChange\u2019 option is set to false. Attempted operation: \u2018``\u2019. \n### DELTA\\_DELETED\\_PARQUET\\_FILE\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFile `` referenced in the transaction log cannot be found. This parquet file may be deleted under Delta\u2019s data retention policy. \nDefault Delta data retention duration: ``. Modification time of the parquet file: ``. Deletion time of the parquet file: ``. Deleted on Delta version: ``. \n### DELTA\\_DELETION\\_VECTOR\\_MISSING\\_NUM\\_RECORDS \n[SQLSTATE: 2D521](https://docs.databricks.com/error-messages/sqlstates.html#class-2d-invalid-transaction-termination) \nIt is invalid to commit files with deletion vectors that are missing the numRecords statistic. \n### DELTA\\_DOMAIN\\_METADATA\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDetected DomainMetadata action(s) for domains ``, but DomainMetadataTableFeature is not enabled. \n### DELTA\\_DROP\\_COLUMN\\_AT\\_INDEX\\_LESS\\_THAN\\_ZERO \n[SQLSTATE: 42KD8](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nIndex `` to drop column is lower than 0 \n### DELTA\\_DUPLICATE\\_ACTIONS\\_FOUND \n[SQLSTATE: 2D521](https://docs.databricks.com/error-messages/sqlstates.html#class-2d-invalid-transaction-termination) \nFile operation \u2018``\u2019 for path `` was specified several times. \nIt conflicts with ``. \nIt is not valid for multiple file operations with the same path to exist in a single commit. \n### DELTA\\_DUPLICATE\\_COLUMNS\\_FOUND \n[SQLSTATE: 42711](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFound duplicate column(s) ``: `` \n### DELTA\\_DUPLICATE\\_COLUMNS\\_ON\\_INSERT \n[SQLSTATE: 42701](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nDuplicate column names in INSERT clause \n### DELTA\\_DUPLICATE\\_COLUMNS\\_ON\\_UPDATE\\_TABLE \n[SQLSTATE: 42701](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` \nPlease remove duplicate columns before you update your table. \n### DELTA\\_DUPLICATE\\_DATA\\_SKIPPING\\_COLUMNS \n[SQLSTATE: 42701](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nDuplicated data skipping columns found: ``. \n### DELTA\\_DUPLICATE\\_DOMAIN\\_METADATA\\_INTERNAL\\_ERROR \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInternal error: two DomainMetadata actions within the same transaction have the same domain `` \n### DELTA\\_DV\\_HISTOGRAM\\_DESERIALIZATON \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCould not deserialize the deleted record counts histogram during table integrity verification. \n### DELTA\\_DYNAMIC\\_PARTITION\\_OVERWRITE\\_DISABLED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDynamic partition overwrite mode is specified by session config or write options, but it is disabled by `spark.databricks.delta.dynamicPartitionOverwrite.enabled=false`. \n### DELTA\\_EMPTY\\_DATA \n[SQLSTATE: 428GU](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nData used in creating the Delta table doesn\u2019t have any columns. \n### DELTA\\_EMPTY\\_DIRECTORY \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nNo file found in the directory: ``. \n### DELTA\\_EXCEED\\_CHAR\\_VARCHAR\\_LIMIT \n[SQLSTATE: 22001](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nExceeds char/varchar type length limitation. Failed check: ``. \n### DELTA\\_FAILED\\_CAST\\_PARTITION\\_VALUE \n[SQLSTATE: 22018](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to cast partition value `` to `` \n### DELTA\\_FAILED\\_FIND\\_ATTRIBUTE\\_IN\\_OUTPUT\\_COLUMNS \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCould not find `` among the existing target output `` \n### DELTA\\_FAILED\\_INFER\\_SCHEMA \n[SQLSTATE: 42KD9](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to infer schema from the given list of files. \n### DELTA\\_FAILED\\_MERGE\\_SCHEMA\\_FILE \n[SQLSTATE: 42KDA](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to merge schema of file ``: \n`` \n### DELTA\\_FAILED\\_READ\\_FILE\\_FOOTER \n[SQLSTATE: KD001](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nCould not read footer for file: `` \n### DELTA\\_FAILED\\_RECOGNIZE\\_PREDICATE \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot recognize the predicate \u2018``\u2019 \n### DELTA\\_FAILED\\_SCAN\\_WITH\\_HISTORICAL\\_VERSION \n[SQLSTATE: KD002](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nExpect a full scan of the latest version of the Delta source, but found a historical scan of version `` \n### DELTA\\_FAILED\\_TO\\_MERGE\\_FIELDS \n[SQLSTATE: 22005](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to merge fields \u2018``\u2019 and \u2018``\u2019 \n### DELTA\\_FEATURES\\_PROTOCOL\\_METADATA\\_MISMATCH \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnable to operate on this table because the following table features are enabled in metadata but not listed in protocol: ``. \n### DELTA\\_FEATURES\\_REQUIRE\\_MANUAL\\_ENABLEMENT \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nYour table schema requires manually enablement of the following table feature(s): ``. \nTo do this, run the following command for each of features listed above: \n`ALTER TABLE table_name SET TBLPROPERTIES ('delta.feature.feature_name' = 'supported')` \nReplace \u201ctable\\_name\u201d and \u201cfeature\\_name\u201d with real values. \nCurrent supported feature(s): ``. \n### DELTA\\_FEATURE\\_DROP\\_CONFLICT\\_REVALIDATION\\_FAIL \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot drop feature because a concurrent transaction modified the table. \nPlease try the operation again. \n`` \n### DELTA\\_FEATURE\\_DROP\\_FEATURE\\_NOT\\_PRESENT \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot drop `` from this table because it is not currently present in the table\u2019s protocol. \n### DELTA\\_FEATURE\\_DROP\\_HISTORICAL\\_VERSIONS\\_EXIST \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot drop `` because the Delta log contains historical versions that use the feature. \nPlease wait until the history retention period (``=``) \nhas passed since the feature was last active. \nAlternatively, please wait for the TRUNCATE HISTORY retention period to expire (``) \nand then run: \n`ALTER TABLE table_name DROP FEATURE feature_name TRUNCATE HISTORY` \n### DELTA\\_FEATURE\\_DROP\\_HISTORY\\_TRUNCATION\\_NOT\\_ALLOWED \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nHistory truncation is only relevant for reader features. \n### DELTA\\_FEATURE\\_DROP\\_NONREMOVABLE\\_FEATURE \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot drop `` because dropping this feature is not supported. \nPlease contact Databricks support. \n### DELTA\\_FEATURE\\_DROP\\_UNSUPPORTED\\_CLIENT\\_FEATURE \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot drop `` because it is not supported by this Databricks version. \nConsider using Databricks with a higher version. \n### DELTA\\_FEATURE\\_DROP\\_WAIT\\_FOR\\_RETENTION\\_PERIOD \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDropping `` was partially successful. \nThe feature is now no longer used in the current version of the table. However, the feature \nis still present in historical versions of the table. The table feature cannot be dropped \nfrom the table protocol until these historical versions have expired. \nTo drop the table feature from the protocol, please wait for the historical versions to \nexpire, and then repeat this command. The retention period for historical versions is \ncurrently configured as ``=``. \nAlternatively, please wait for the TRUNCATE HISTORY retention period to expire (``) \nand then run: \n`ALTER TABLE table_name DROP FEATURE feature_name TRUNCATE HISTORY` \n### DELTA\\_FEATURE\\_REQUIRES\\_HIGHER\\_READER\\_VERSION \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnable to enable table feature `` because it requires a higher reader protocol version (current ``). Consider upgrading the table\u2019s reader protocol version to ``, or to a version which supports reader table features. Refer to `` for more information on table protocol versions. \n### DELTA\\_FEATURE\\_REQUIRES\\_HIGHER\\_WRITER\\_VERSION \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnable to enable table feature `` because it requires a higher writer protocol version (current ``). Consider upgrading the table\u2019s writer protocol version to ``, or to a version which supports writer table features. Refer to `` for more information on table protocol versions. \n### DELTA\\_FILE\\_ALREADY\\_EXISTS \n[SQLSTATE: 42K04](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nExisting file path `` \n### DELTA\\_FILE\\_LIST\\_AND\\_PATTERN\\_STRING\\_CONFLICT \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot specify both file list and pattern string. \n### DELTA\\_FILE\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFile path `` \n### DELTA\\_FILE\\_NOT\\_FOUND\\_DETAILED \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFile `` referenced in the transaction log cannot be found. This occurs when data has been manually deleted from the file system rather than using the table `DELETE` statement. For more information, see `` \n### DELTA\\_FILE\\_OR\\_DIR\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nNo such file or directory: `` \n### DELTA\\_FILE\\_TO\\_OVERWRITE\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFile (``) to be rewritten not found among candidate files: \n`` \n### DELTA\\_FOUND\\_MAP\\_TYPE\\_COLUMN \n[SQLSTATE: KD003](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nA MapType was found. In order to access the key or value of a MapType, specify one \nof: \n`` or \n`` \nfollowed by the name of the column (only if that column is a struct type). \ne.g. mymap.key.mykey \nIf the column is a basic type, mymap.key or mymap.value is sufficient. \n### DELTA\\_GENERATED\\_COLUMNS\\_DATA\\_TYPE\\_MISMATCH \n[SQLSTATE: 42K09](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nColumn `` is a generated column or a column used by a generated column. The data type is ``. It doesn\u2019t accept data type `` \n### DELTA\\_GENERATED\\_COLUMNS\\_EXPR\\_TYPE\\_MISMATCH \n[SQLSTATE: 42K09](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe expression type of the generated column `` is ``, but the column type is `` \n### DELTA\\_GENERATED\\_COLUMN\\_UPDATE\\_TYPE\\_MISMATCH \n[SQLSTATE: 42K09](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nColumn `` is a generated column or a column used by a generated column. The data type is `` and cannot be converted to data type `` \n### [DELTA\\_ICEBERG\\_COMPAT\\_VIOLATION](https://docs.databricks.com/error-messages/delta-iceberg-compat-violation-error-class.html) \n[SQLSTATE: KD00E](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nThe validation of IcebergCompatV`` has failed. \nFor more details see [DELTA\\_ICEBERG\\_COMPAT\\_VIOLATION](https://docs.databricks.com/error-messages/delta-iceberg-compat-violation-error-class.html) \n### DELTA\\_ILLEGAL\\_OPTION \n[SQLSTATE: 42616](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid value \u2018``\u2019 for option \u2018``\u2019, `` \n### DELTA\\_ILLEGAL\\_USAGE \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe usage of `
` is not a Delta table. Please drop this table first if you would like to create it with Databricks Delta. \n### DELTA\\_NOT\\_A\\_DELTA\\_TABLE \n[SQLSTATE: 0AKDD](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is not a Delta table. Please drop this table first if you would like to recreate it with Delta Lake. \n### DELTA\\_NOT\\_NULL\\_COLUMN\\_NOT\\_FOUND\\_IN\\_STRUCT \n[SQLSTATE: 42K09](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nNot nullable column not found in struct: `` \n### DELTA\\_NOT\\_NULL\\_CONSTRAINT\\_VIOLATED \n[SQLSTATE: 23502](https://docs.databricks.com/error-messages/sqlstates.html#class-23-integrity-constraint-violation) \nNOT NULL constraint violated for column: ``. \n### DELTA\\_NOT\\_NULL\\_NESTED\\_FIELD \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nA non-nullable nested field can\u2019t be added to a nullable parent. Please set the nullability of the parent column accordingly. \n### DELTA\\_NO\\_COMMITS\\_FOUND \n[SQLSTATE: KD006](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nNo commits found at `` \n### DELTA\\_NO\\_RECREATABLE\\_HISTORY\\_FOUND \n[SQLSTATE: KD006](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nNo recreatable commits found at `` \n### DELTA\\_NO\\_RELATION\\_TABLE \n[SQLSTATE: 42P01](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nTable `` not found \n### DELTA\\_NO\\_START\\_FOR\\_CDC\\_READ \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nNo startingVersion or startingTimestamp provided for CDC read. \n### DELTA\\_NULL\\_SCHEMA\\_IN\\_STREAMING\\_WRITE \n[SQLSTATE: 42P18](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nDelta doesn\u2019t accept NullTypes in the schema for streaming writes. \n### DELTA\\_ONEOF\\_IN\\_TIMETRAVEL \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPlease either provide \u2018timestampAsOf\u2019 or \u2018versionAsOf\u2019 for time travel. \n### DELTA\\_ONLY\\_OPERATION \n[SQLSTATE: 0AKDD](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is only supported for Delta tables. \n### DELTA\\_OPERATION\\_MISSING\\_PATH \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPlease provide the path or table identifier for ``. \n### DELTA\\_OPERATION\\_NOT\\_ALLOWED \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nOperation not allowed: `` is not supported for Delta tables \n### DELTA\\_OPERATION\\_NOT\\_ALLOWED\\_DETAIL \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nOperation not allowed: `` is not supported for Delta tables: `` \n### DELTA\\_OPERATION\\_ON\\_TEMP\\_VIEW\\_WITH\\_GENERATED\\_COLS\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` command on a temp view referring to a Delta table that contains generated columns is not supported. Please run the `` command on the Delta table directly \n### DELTA\\_OVERWRITE\\_MUST\\_BE\\_TRUE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCopy option overwriteSchema cannot be specified without setting OVERWRITE = \u2018true\u2019. \n### DELTA\\_OVERWRITE\\_SCHEMA\\_WITH\\_DYNAMIC\\_PARTITION\\_OVERWRITE \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n\u2018overwriteSchema\u2019 cannot be used in dynamic partition overwrite mode. \n### DELTA\\_PARTITION\\_COLUMN\\_CAST\\_FAILED \n[SQLSTATE: 22525](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to cast value `` to `` for partition column `` \n### DELTA\\_PARTITION\\_COLUMN\\_NOT\\_FOUND \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPartition column `` not found in schema [``] \n### DELTA\\_PARTITION\\_SCHEMA\\_IN\\_ICEBERG\\_TABLES \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPartition schema cannot be specified when converting Iceberg tables. It is automatically inferred. \n### DELTA\\_PATH\\_DOES\\_NOT\\_EXIST \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` doesn\u2019t exist, or is not a Delta table. \n### DELTA\\_PATH\\_EXISTS \n[SQLSTATE: 42K04](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot write to already existent path `` without setting OVERWRITE = \u2018true\u2019. \n### DELTA\\_POST\\_COMMIT\\_HOOK\\_FAILED \n[SQLSTATE: 2DKD0](https://docs.databricks.com/error-messages/sqlstates.html#class-2d-invalid-transaction-termination) \nCommitting to the Delta table version `` succeeded but error while executing post-commit hook `` `` \n### DELTA\\_PROTOCOL\\_PROPERTY\\_NOT\\_INT \n[SQLSTATE: 42K06](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nProtocol property `` needs to be an integer. Found `` \n### DELTA\\_READ\\_FEATURE\\_PROTOCOL\\_REQUIRES\\_WRITE \n[SQLSTATE: KD004](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nUnable to upgrade only the reader protocol version to use table features. Writer protocol version must be at least `` to proceed. Refer to `` for more information on table protocol versions. \n### DELTA\\_READ\\_TABLE\\_WITHOUT\\_COLUMNS \n[SQLSTATE: 428GU](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nYou are trying to read a Delta table `` that does not have any columns. \nWrite some new data with the option `mergeSchema = true` to be able to read the table. \n### DELTA\\_REGEX\\_OPT\\_SYNTAX\\_ERROR \n[SQLSTATE: 2201B](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nPlease recheck your syntax for \u2018``\u2019 \n### DELTA\\_REPLACE\\_WHERE\\_IN\\_OVERWRITE \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nYou can\u2019t use replaceWhere in conjunction with an overwrite by filter \n### DELTA\\_REPLACE\\_WHERE\\_MISMATCH \n[SQLSTATE: 44000](https://docs.databricks.com/error-messages/sqlstates.html#class-44-with-check-option-violation) \nWritten data does not conform to partial table overwrite condition or constraint \u2018``\u2019. \n`` \n### DELTA\\_REPLACE\\_WHERE\\_WITH\\_DYNAMIC\\_PARTITION\\_OVERWRITE \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nA \u2018replaceWhere\u2019 expression and \u2018partitionOverwriteMode\u2019=\u2019dynamic\u2019 cannot both be set in the DataFrameWriter options. \n### DELTA\\_REPLACE\\_WHERE\\_WITH\\_FILTER\\_DATA\\_CHANGE\\_UNSET \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n\u2018replaceWhere\u2019 cannot be used with data filters when \u2018dataChange\u2019 is set to false. Filters: `` \n### DELTA\\_ROW\\_ID\\_ASSIGNMENT\\_WITHOUT\\_STATS \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot assign row IDs without row count statistics. \nCollect statistics for the table by running the following code in a Scala notebook and retry: \nimport com.databricks.sql.transaction.tahoe.DeltaLog \nimport com.databricks.sql.transaction.tahoe.stats.StatisticsCollection \nimport org.apache.spark.sql.catalyst.TableIdentifier \nval log = DeltaLog.forTable(spark, TableIdentifier(table\\_name)) \nStatisticsCollection.recompute(spark, log) \n### DELTA\\_SCHEMA\\_CHANGED \n[SQLSTATE: KD007](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nDetected schema change: \nstreaming source schema: `` \ndata file schema: `` \nPlease try restarting the query. If this issue repeats across query restarts without \nmaking progress, you have made an incompatible schema change and need to start your \nquery from scratch using a new checkpoint directory. \n### DELTA\\_SCHEMA\\_CHANGED\\_WITH\\_STARTING\\_OPTIONS \n[SQLSTATE: KD007](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nDetected schema change in version ``: \nstreaming source schema: `` \ndata file schema: `` \nPlease try restarting the query. If this issue repeats across query restarts without \nmaking progress, you have made an incompatible schema change and need to start your \nquery from scratch using a new checkpoint directory. If the issue persists after \nchanging to a new checkpoint directory, you may need to change the existing \n\u2018startingVersion\u2019 or \u2018startingTimestamp\u2019 option to start from a version newer than \n`` with a new checkpoint directory. \n### DELTA\\_SCHEMA\\_CHANGED\\_WITH\\_VERSION \n[SQLSTATE: KD007](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nDetected schema change in version ``: \nstreaming source schema: `` \ndata file schema: `` \nPlease try restarting the query. If this issue repeats across query restarts without \nmaking progress, you have made an incompatible schema change and need to start your \nquery from scratch using a new checkpoint directory. \n### DELTA\\_SCHEMA\\_CHANGE\\_SINCE\\_ANALYSIS \n[SQLSTATE: KD007](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nThe schema of your Delta table has changed in an incompatible way since your DataFrame \nor DeltaTable object was created. Please redefine your DataFrame or DeltaTable object. \nChanges: \n`` `` \n### DELTA\\_SCHEMA\\_NOT\\_PROVIDED \n[SQLSTATE: 42908](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nTable schema is not provided. Please provide the schema (column definition) of the table when using REPLACE table and an AS SELECT query is not provided. \n### DELTA\\_SCHEMA\\_NOT\\_SET \n[SQLSTATE: KD008](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nTable schema is not set. Write data into it or use CREATE TABLE to set the schema. \n### DELTA\\_SET\\_LOCATION\\_SCHEMA\\_MISMATCH \n[SQLSTATE: 42KD7](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe schema of the new Delta location is different than the current table schema. \noriginal schema: \n`` \ndestination schema: \n`` \nIf this is an intended change, you may turn this check off by running: \n%%sql set `` = true \n### DELTA\\_SHALLOW\\_CLONE\\_FILE\\_NOT\\_FOUND \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFile `` referenced in the transaction log cannot be found. This can occur when data has been manually deleted from the file system rather than using the table `DELETE` statement. This table appears to be a shallow clone, if that is the case, this error can occur when the original table from which this table was cloned has deleted a file that the clone is still using. If you want any clones to be independent of the original table, use a DEEP clone instead. \n### DELTA\\_SHARING\\_CURRENT\\_RECIPIENT\\_PROPERTY\\_UNDEFINED \n[SQLSTATE: 42704](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe data is restricted by recipient property `` that do not apply to the current recipient in the session. Please contact the data provider to resolve the issue. \n### DELTA\\_SHARING\\_INVALID\\_OP\\_IN\\_EXTERNAL\\_SHARED\\_VIEW \n[SQLSTATE: 42887](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` cannot be used in Delta Sharing views that are shared cross account. \n### DELTA\\_SHOW\\_PARTITION\\_IN\\_NON\\_PARTITIONED\\_COLUMN \n[SQLSTATE: 42P10](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nNon-partitioning column(s) `` are specified for SHOW PARTITIONS \n### DELTA\\_SHOW\\_PARTITION\\_IN\\_NON\\_PARTITIONED\\_TABLE \n[SQLSTATE: 42809](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nSHOW PARTITIONS is not allowed on a table that is not partitioned: `` \n### DELTA\\_SOURCE\\_IGNORE\\_DELETE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDetected deleted data (for example ``) from streaming source at version ``. This is currently not supported. If you\u2019d like to ignore deletes, set the option \u2018ignoreDeletes\u2019 to \u2018true\u2019. The source table can be found at path ``. \n### DELTA\\_SOURCE\\_TABLE\\_IGNORE\\_CHANGES \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDetected a data update (for example ``) in the source table at version ``. This is currently not supported. If this is going to happen regularly and you are okay to skip changes, set the option \u2018skipChangeCommits\u2019 to \u2018true\u2019. If you would like the data update to be reflected, please restart this query with a fresh checkpoint directory or do a full refresh if you are using DLT. If you need to handle these changes, please switch to MVs. The source table can be found at path ``. \n### DELTA\\_STARTING\\_VERSION\\_AND\\_TIMESTAMP\\_BOTH\\_SET \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPlease either provide \u2018``\u2019 or \u2018``\u2019 \n### DELTA\\_STATS\\_COLLECTION\\_COLUMN\\_NOT\\_FOUND \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` stats not found for column in Parquet metadata: ``. \n### DELTA\\_STREAMING\\_CANNOT\\_CONTINUE\\_PROCESSING\\_POST\\_SCHEMA\\_EVOLUTION \n[SQLSTATE: KD002](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nWe\u2019ve detected one or more non-additive schema change(s) (``) between Delta version `` and `` in the Delta streaming source. \nPlease check if you want to manually propagate the schema change(s) to the sink table before we proceed with stream processing using the finalized schema at ``. \nOnce you have fixed the schema of the sink table or have decided there is no need to fix, you can set (one of) the following SQL configurations to unblock the non-additive schema change(s) and continue stream processing. \nTo unblock for this particular stream just for this series of schema change(s): set `` = ``. \nTo unblock for this particular stream: set `` = `` \nTo unblock for all streams: set `` = ``. \nAlternatively if applicable, you may replace the `` with `` in the SQL conf to unblock stream for just this schema change type. \n### DELTA\\_STREAMING\\_CHECK\\_COLUMN\\_MAPPING\\_NO\\_SNAPSHOT \n[SQLSTATE: KD002](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nFailed to obtain Delta log snapshot for the start version when checking column mapping schema changes. Please choose a different start version, or force enable streaming read at your own risk by setting \u2018``\u2019 to \u2018true\u2019. \n### DELTA\\_STREAMING\\_INCOMPATIBLE\\_SCHEMA\\_CHANGE \n[SQLSTATE: 42KD4](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nStreaming read is not supported on tables with read-incompatible schema changes (e.g. rename or drop or datatype changes). \nFor further information and possible next steps to resolve this issue, please review the documentation at `` \nRead schema: ``. Incompatible data schema: ``. \n### DELTA\\_STREAMING\\_INCOMPATIBLE\\_SCHEMA\\_CHANGE\\_USE\\_SCHEMA\\_LOG \n[SQLSTATE: 42KD4](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nStreaming read is not supported on tables with read-incompatible schema changes (e.g. rename or drop or datatype changes). \nPlease provide a \u2018schemaTrackingLocation\u2019 to enable non-additive schema evolution for Delta stream processing. \nSee `` for more details. \nRead schema: ``. Incompatible data schema: ``. \n### DELTA\\_STREAMING\\_METADATA\\_EVOLUTION \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe schema, table configuration or protocol of your Delta table has changed during streaming. \nThe schema or metadata tracking log has been updated. \nPlease restart the stream to continue processing using the updated metadata. \nUpdated schema: ``. \nUpdated table configurations: ``. \nUpdated table protocol: `` \n### DELTA\\_STREAMING\\_SCHEMA\\_EVOLUTION\\_UNSUPPORTED\\_ROW\\_FILTER\\_COLUMN\\_MASKS \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nStreaming from source table `` with schema tracking does not support row filters or column masks. \nPlease drop the row filters or column masks, or disable schema tracking. \n### DELTA\\_STREAMING\\_SCHEMA\\_LOCATION\\_CONFLICT \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nDetected conflicting schema location \u2018``\u2019 while streaming from table or table located at \u2018`
`\u2019. \nAnother stream may be reusing the same schema location, which is not allowed. \nPlease provide a new unique `schemaTrackingLocation` path or `streamingSourceTrackingId` as a reader option for one of the streams from this table. \n### DELTA\\_STREAMING\\_SCHEMA\\_LOCATION\\_NOT\\_UNDER\\_CHECKPOINT \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nSchema location \u2018``\u2019 must be placed under checkpoint location \u2018``\u2019. \n### DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_DESERIALIZE\\_FAILED \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nIncomplete log file in the Delta streaming source schema log at \u2018``\u2019. \nThe schema log may have been corrupted. Please pick a new schema location. \n### DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_INCOMPATIBLE\\_DELTA\\_TABLE\\_ID \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nDetected incompatible Delta table id when trying to read Delta stream. \nPersisted table id: ``, Table id: `` \nThe schema log might have been reused. Please pick a new schema location. \n### DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_INCOMPATIBLE\\_PARTITION\\_SCHEMA \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nDetected incompatible partition schema when trying to read Delta stream. \nPersisted schema: ``, Delta partition schema: `` \nPlease pick a new schema location to reinitialize the schema log if you have manually changed the table\u2019s partition schema recently. \n### DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_INIT\\_FAILED\\_INCOMPATIBLE\\_METADATA \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nWe could not initialize the Delta streaming source schema log because \nwe detected an incompatible schema or protocol change while serving a streaming batch from table version `` to ``. \n### DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_PARSE\\_SCHEMA\\_FAILED \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to parse the schema from the Delta streaming source schema log. \nThe schema log may have been corrupted. Please pick a new schema location. \n### DELTA\\_TABLE\\_ALREADY\\_CONTAINS\\_CDC\\_COLUMNS \n[SQLSTATE: 42711](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to enable Change Data Capture on the table. The table already contains \nreserved columns `` that will \nbe used internally as metadata for the table\u2019s Change Data Feed. To enable \nChange Data Feed on the table rename/drop these columns. \n### DELTA\\_TABLE\\_ALREADY\\_EXISTS \n[SQLSTATE: 42P07](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nTable `` already exists. \n### DELTA\\_TABLE\\_FOR\\_PATH\\_UNSUPPORTED\\_HADOOP\\_CONF \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCurrently DeltaTable.forPath only supports hadoop configuration keys starting with `` but got `` \n### DELTA\\_TABLE\\_ID\\_MISMATCH \n[SQLSTATE: KD007](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nThe Delta table at `` has been replaced while this command was using the table. \nTable id was `` but is now ``. \nPlease retry the current command to ensure it reads a consistent view of the table. \n### DELTA\\_TABLE\\_LOCATION\\_MISMATCH \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe location of the existing table `` is ``. It doesn\u2019t match the specified location ``. \n### DELTA\\_TABLE\\_NOT\\_FOUND \n[SQLSTATE: 42P01](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nDelta table `` doesn\u2019t exist. \n### DELTA\\_TABLE\\_NOT\\_SUPPORTED\\_IN\\_OP \n[SQLSTATE: 42809](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nTable is not supported in ``. Please use a path instead. \n### DELTA\\_TABLE\\_ONLY\\_OPERATION \n[SQLSTATE: 0AKDD](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is not a Delta table. `` is only supported for Delta tables. \n### DELTA\\_TARGET\\_TABLE\\_FINAL\\_SCHEMA\\_EMPTY \n[SQLSTATE: 428GU](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nTarget table final schema is empty. \n### DELTA\\_TIMESTAMP\\_GREATER\\_THAN\\_COMMIT \n[SQLSTATE: 42816](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe provided timestamp (``) is after the latest version available to this \ntable (``). Please use a timestamp before or at ``. \n### DELTA\\_TIMESTAMP\\_INVALID \n[SQLSTATE: 42816](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe provided timestamp (``) cannot be converted to a valid timestamp. \n### DELTA\\_TIME\\_TRAVEL\\_INVALID\\_BEGIN\\_VALUE \n[SQLSTATE: 42604](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` needs to be a valid begin value. \n### DELTA\\_TRUNCATED\\_TRANSACTION\\_LOG \n[SQLSTATE: 42K03](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n``: Unable to reconstruct state at version `` as the transaction log has been truncated due to manual deletion or the log retention policy (``=``) and checkpoint retention policy (``=``) \n### DELTA\\_TRUNCATE\\_TABLE\\_PARTITION\\_NOT\\_SUPPORTED \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nOperation not allowed: TRUNCATE TABLE on Delta tables does not support partition predicates; use DELETE to delete specific partitions or rows. \n### DELTA\\_UDF\\_IN\\_GENERATED\\_COLUMN \n[SQLSTATE: 42621](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFound ``. A generated column cannot use a user-defined function \n### DELTA\\_UNEXPECTED\\_ACTION\\_EXPRESSION \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnexpected action expression ``. \n### DELTA\\_UNEXPECTED\\_NUM\\_PARTITION\\_COLUMNS\\_FROM\\_FILE\\_NAME \n[SQLSTATE: KD009](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nExpecting `` partition column(s): ``, but found `` partition column(s): `` from parsing the file name: `` \n### DELTA\\_UNEXPECTED\\_PARTIAL\\_SCAN \n[SQLSTATE: KD00A](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nExpect a full scan of Delta sources, but found a partial scan. path:`` \n### DELTA\\_UNEXPECTED\\_PARTITION\\_COLUMN\\_FROM\\_FILE\\_NAME \n[SQLSTATE: KD009](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nExpecting partition column ``, but found partition column `` from parsing the file name: `` \n### DELTA\\_UNEXPECTED\\_PARTITION\\_SCHEMA\\_FROM\\_USER \n[SQLSTATE: KD009](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nCONVERT TO DELTA was called with a partition schema different from the partition schema inferred from the catalog, please avoid providing the schema so that the partition schema can be chosen from the catalog. \ncatalog partition schema: \n`` \nprovided partition schema: \n`` \n### DELTA\\_UNIFORM\\_NOT\\_SUPPORTED \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUniversal Format is only supported on Unity Catalog tables. \n### DELTA\\_UNIVERSAL\\_FORMAT\\_VIOLATION \n[SQLSTATE: KD00E](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nThe validation of Universal Format (``) has failed: `` \n### DELTA\\_UNKNOWN\\_CONFIGURATION \n[SQLSTATE: F0000](https://docs.databricks.com/error-messages/sqlstates.html#class-f0-configuration-file-error) \nUnknown configuration was specified: `` \n### DELTA\\_UNKNOWN\\_PRIVILEGE \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnknown privilege: `` \n### DELTA\\_UNKNOWN\\_READ\\_LIMIT \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnknown ReadLimit: `` \n### DELTA\\_UNRECOGNIZED\\_COLUMN\\_CHANGE \n[SQLSTATE: 42601](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnrecognized column change ``. You may be running an out-of-date Delta Lake version. \n### DELTA\\_UNRECOGNIZED\\_INVARIANT \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nUnrecognized invariant. Please upgrade your Spark version. \n### DELTA\\_UNRECOGNIZED\\_LOGFILE \n[SQLSTATE: KD00B](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nUnrecognized log file `` \n### DELTA\\_UNSET\\_NON\\_EXISTENT\\_PROPERTY \n[SQLSTATE: 42616](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nAttempted to unset non-existent property \u2018``\u2019 in table `` \n### DELTA\\_UNSUPPORTED\\_ABS\\_PATH\\_ADD\\_FILE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` does not support adding files with an absolute path \n### DELTA\\_UNSUPPORTED\\_ALTER\\_TABLE\\_CHANGE\\_COL\\_OP \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nALTER TABLE CHANGE COLUMN is not supported for changing column `` from `` to `` \n### DELTA\\_UNSUPPORTED\\_ALTER\\_TABLE\\_REPLACE\\_COL\\_OP \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupported ALTER TABLE REPLACE COLUMNS operation. Reason: `
` \nFailed to change schema from: \n`` \nto: \n`` \n### DELTA\\_UNSUPPORTED\\_CLONE\\_REPLACE\\_SAME\\_TABLE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nYou tried to REPLACE an existing table (``) with CLONE. This operation is \nunsupported. Try a different target for CLONE or delete the table at the current target. \n### DELTA\\_UNSUPPORTED\\_COLUMN\\_MAPPING\\_MODE\\_CHANGE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nChanging column mapping mode from \u2018``\u2019 to \u2018``\u2019 is not supported. \n### DELTA\\_UNSUPPORTED\\_COLUMN\\_MAPPING\\_PROTOCOL \n[SQLSTATE: KD004](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nYour current table protocol version does not support changing column mapping modes \nusing ``. \nRequired Delta protocol version for column mapping: \n`` \nYour table\u2019s current Delta protocol version: \n`` \n`` \n### DELTA\\_UNSUPPORTED\\_COLUMN\\_MAPPING\\_SCHEMA\\_CHANGE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nSchema change is detected: \nold schema: \n`` \nnew schema: \n`` \nSchema changes are not allowed during the change of column mapping mode. \n### DELTA\\_UNSUPPORTED\\_COLUMN\\_MAPPING\\_WRITE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nWriting data with column mapping mode is not supported. \n### DELTA\\_UNSUPPORTED\\_COLUMN\\_TYPE\\_IN\\_BLOOM\\_FILTER \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCreating a bloom filter index on a column with type `` is unsupported: `` \n### DELTA\\_UNSUPPORTED\\_COMMENT\\_MAP\\_ARRAY \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCan\u2019t add a comment to ``. Adding a comment to a map key/value or array element is not supported. \n### DELTA\\_UNSUPPORTED\\_DATA\\_TYPES \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nFound columns using unsupported data types: ``. You can set \u2018``\u2019 to \u2018false\u2019 to disable the type check. Disabling this type check may allow users to create unsupported Delta tables and should only be used when trying to read/write legacy tables. \n### DELTA\\_UNSUPPORTED\\_DEEP\\_CLONE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDeep clone is not supported for this Delta version. \n### DELTA\\_UNSUPPORTED\\_DESCRIBE\\_DETAIL\\_VIEW \n[SQLSTATE: 42809](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` is a view. DESCRIBE DETAIL is only supported for tables. \n### DELTA\\_UNSUPPORTED\\_DROP\\_CLUSTERING\\_COLUMN \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDropping clustering columns (``) is not allowed. \n### DELTA\\_UNSUPPORTED\\_DROP\\_COLUMN \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDROP COLUMN is not supported for your Delta table. `` \n### DELTA\\_UNSUPPORTED\\_DROP\\_NESTED\\_COLUMN\\_FROM\\_NON\\_STRUCT\\_TYPE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCan only drop nested columns from StructType. Found `` \n### DELTA\\_UNSUPPORTED\\_DROP\\_PARTITION\\_COLUMN \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDropping partition columns (``) is not allowed. \n### DELTA\\_UNSUPPORTED\\_EXPRESSION \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupported expression type(``) for ``. The supported types are [``]. \n### DELTA\\_UNSUPPORTED\\_EXPRESSION\\_GENERATED\\_COLUMN \n[SQLSTATE: 42621](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` cannot be used in a generated column \n### DELTA\\_UNSUPPORTED\\_FEATURES\\_FOR\\_READ \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nUnsupported Delta read feature: table \u201c``\u201d requires reader table feature(s) that are unsupported by this version of Databricks: ``. Please refer to `` for more information on Delta Lake feature compatibility. \n### DELTA\\_UNSUPPORTED\\_FEATURES\\_FOR\\_WRITE \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nUnsupported Delta write feature: table \u201c``\u201d requires writer table feature(s) that are unsupported by this version of Databricks: ``. Please refer to `` for more information on Delta Lake feature compatibility. \n### DELTA\\_UNSUPPORTED\\_FEATURES\\_IN\\_CONFIG \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nTable feature(s) configured in the following Spark configs or Delta table properties are not recognized by this version of Databricks: ``. \n### DELTA\\_UNSUPPORTED\\_FEATURE\\_STATUS \n[SQLSTATE: 0AKDE](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nExpecting the status for table feature `` to be \u201csupported\u201d, but got \u201c``\u201d. \n### DELTA\\_UNSUPPORTED\\_FIELD\\_UPDATE\\_NON\\_STRUCT \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUpdating nested fields is only supported for StructType, but you are trying to update a field of ``, which is of type: ``. \n### DELTA\\_UNSUPPORTED\\_FSCK\\_WITH\\_DELETION\\_VECTORS \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe \u2018FSCK REPAIR TABLE\u2019 command is not supported on table versions with missing deletion vector files. \nPlease contact support. \n### DELTA\\_UNSUPPORTED\\_GENERATE\\_WITH\\_DELETION\\_VECTORS \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe \u2018GENERATE symlink\\_format\\_manifest\u2019 command is not supported on table versions with deletion vectors. \nIn order to produce a version of the table without deletion vectors, run \u2018REORG TABLE table APPLY (PURGE)\u2019. Then re-run the \u2018GENERATE\u2019 command. \nMake sure that no concurrent transactions are adding deletion vectors again between REORG and GENERATE. \nIf you need to generate manifests regularly, or you cannot prevent concurrent transactions, consider disabling deletion vectors on this table using \u2018ALTER TABLE table SET TBLPROPERTIES (delta.enableDeletionVectors = false)\u2019. \n### DELTA\\_UNSUPPORTED\\_INVARIANT\\_NON\\_STRUCT \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nInvariants on nested fields other than StructTypes are not supported. \n### DELTA\\_UNSUPPORTED\\_IN\\_SUBQUERY \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nIn subquery is not supported in the `` condition. \n### DELTA\\_UNSUPPORTED\\_LIST\\_KEYS\\_WITH\\_PREFIX \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nlistKeywithPrefix not available \n### DELTA\\_UNSUPPORTED\\_MANIFEST\\_GENERATION\\_WITH\\_COLUMN\\_MAPPING \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nManifest generation is not supported for tables that leverage column mapping, as external readers cannot read these Delta tables. See Delta documentation for more details. \n### DELTA\\_UNSUPPORTED\\_MERGE\\_SCHEMA\\_EVOLUTION\\_WITH\\_CDC \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nMERGE INTO operations with schema evolution do not currently support writing CDC output. \n### DELTA\\_UNSUPPORTED\\_MULTI\\_COL\\_IN\\_PREDICATE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nMulti-column In predicates are not supported in the `` condition. \n### DELTA\\_UNSUPPORTED\\_NESTED\\_COLUMN\\_IN\\_BLOOM\\_FILTER \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCreating a bloom filer index on a nested column is currently unsupported: `` \n### DELTA\\_UNSUPPORTED\\_NESTED\\_FIELD\\_IN\\_OPERATION \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nNested field is not supported in the `` (field = ``). \n### DELTA\\_UNSUPPORTED\\_NON\\_EMPTY\\_CLONE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe clone destination table is non-empty. Please TRUNCATE or DELETE FROM the table before running CLONE. \n### DELTA\\_UNSUPPORTED\\_OUTPUT\\_MODE \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nData source `` does not support `` output mode \n### DELTA\\_UNSUPPORTED\\_PARTITION\\_COLUMN\\_IN\\_BLOOM\\_FILTER \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCreating a bloom filter index on a partitioning column is unsupported: `` \n### DELTA\\_UNSUPPORTED\\_RENAME\\_COLUMN \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nColumn rename is not supported for your Delta table. `` \n### DELTA\\_UNSUPPORTED\\_SCHEMA\\_DURING\\_READ \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nDelta does not support specifying the schema at read time. \n### DELTA\\_UNSUPPORTED\\_SORT\\_ON\\_BUCKETED\\_TABLES \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nSORTED BY is not supported for Delta bucketed tables \n### DELTA\\_UNSUPPORTED\\_SOURCE \n[SQLSTATE: 0AKDD](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` destination only supports Delta sources. \n`` \n### DELTA\\_UNSUPPORTED\\_STATIC\\_PARTITIONS \n[SQLSTATE: 0AKDD](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nSpecifying static partitions in the partition spec is currently not supported during inserts \n### DELTA\\_UNSUPPORTED\\_STRATEGY\\_NAME \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nUnsupported strategy name: `` \n### DELTA\\_UNSUPPORTED\\_SUBQUERY \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nSubqueries are not supported in the `` (condition = ``). \n### DELTA\\_UNSUPPORTED\\_SUBQUERY\\_IN\\_PARTITION\\_PREDICATES \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nSubquery is not supported in partition predicates. \n### DELTA\\_UNSUPPORTED\\_TIME\\_TRAVEL\\_MULTIPLE\\_FORMATS \n[SQLSTATE: 42613](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot specify time travel in multiple formats. \n### DELTA\\_UNSUPPORTED\\_TIME\\_TRAVEL\\_VIEWS \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nCannot time travel views, subqueries, streams or change data feed queries. \n### DELTA\\_UNSUPPORTED\\_TRUNCATE\\_SAMPLE\\_TABLES \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nTruncate sample tables is not supported \n### DELTA\\_UNSUPPORTED\\_VACUUM\\_SPECIFIC\\_PARTITION \n[SQLSTATE: 0AKDC](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nPlease provide the base path (``) when Vacuuming Delta tables. Vacuuming specific partitions is currently not supported. \n### DELTA\\_UNSUPPORTED\\_WRITES\\_STAGED\\_TABLE \n[SQLSTATE: 42807](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nTable implementation does not support writes: `` \n### DELTA\\_UNSUPPORTED\\_WRITE\\_SAMPLE\\_TABLES \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nWrite to sample tables is not supported \n### DELTA\\_UPDATE\\_SCHEMA\\_MISMATCH\\_EXPRESSION \n[SQLSTATE: 42846](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot cast `` to ``. All nested columns must match. \n### [DELTA\\_VERSIONS\\_NOT\\_CONTIGUOUS](https://docs.databricks.com/error-messages/delta-versions-not-contiguous-error-class.html) \n[SQLSTATE: KD00C](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nVersions (``) are not contiguous. \nFor more details see [DELTA\\_VERSIONS\\_NOT\\_CONTIGUOUS](https://docs.databricks.com/error-messages/delta-versions-not-contiguous-error-class.html) \n### DELTA\\_VIOLATE\\_CONSTRAINT\\_WITH\\_VALUES \n[SQLSTATE: 23001](https://docs.databricks.com/error-messages/sqlstates.html#class-23-integrity-constraint-violation) \nCHECK constraint `` `` violated by row with values: \n`` \n### [DELTA\\_VIOLATE\\_TABLE\\_PROPERTY\\_VALIDATION\\_FAILED](https://docs.databricks.com/error-messages/delta-violate-table-property-validation-failed-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe validation of the properties of table `
` has been violated: \nFor more details see [DELTA\\_VIOLATE\\_TABLE\\_PROPERTY\\_VALIDATION\\_FAILED](https://docs.databricks.com/error-messages/delta-violate-table-property-validation-failed-error-class.html) \n### DELTA\\_WRITE\\_INTO\\_VIEW\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is a view. You may not write data into a view. \n### DELTA\\_ZORDERING\\_COLUMN\\_DOES\\_NOT\\_EXIST \n[SQLSTATE: 42703](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nZ-Ordering column `` does not exist in data schema. \n### DELTA\\_ZORDERING\\_ON\\_COLUMN\\_WITHOUT\\_STATS \n[SQLSTATE: KD00D](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nZ-Ordering on `` will be \nineffective, because we currently do not collect stats for these columns. Please refer to \n`` \nfor more information on data skipping and z-ordering. You can disable \nthis check by setting \n\u2018%%sql set `` = false\u2019 \n### DELTA\\_ZORDERING\\_ON\\_PARTITION\\_COLUMN \n[SQLSTATE: 42P10](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \n`` is a partition column. Z-Ordering can only be performed on data columns\n\n", "chunk_id": "956a2221a68dc0f1afafa6079ac852f6", "url": "https://docs.databricks.com/error-messages/error-classes.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### Error classes in Databricks\n##### Autoloader\n\n### CF\\_ADD\\_NEW\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nSchema evolution mode `` is not supported when the schema is specified. To use this mode, you can provide the schema through `cloudFiles.schemaHints` instead. \n### CF\\_AMBIGUOUS\\_AUTH\\_OPTIONS\\_ERROR \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFound notification-setup authentication options for the (default) directory \nlisting mode: \n`` \nIf you wish to use the file notification mode, please explicitly set: \n.option(\u201ccloudFiles.``\u201d, \u201ctrue\u201d) \nAlternatively, if you want to skip the validation of your options and ignore these \nauthentication options, you can set: \n.option(\u201ccloudFiles.ValidateOptionsKey>\u201d, \u201cfalse\u201d) \n### CF\\_AMBIGUOUS\\_INCREMENTAL\\_LISTING\\_MODE\\_ERROR \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nIncremental listing mode (cloudFiles.``) \nand file notification (cloudFiles.``) \nhave been enabled at the same time. \nPlease make sure that you select only one. \n### CF\\_AZURE\\_STORAGE\\_SUFFIXES\\_REQUIRED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nRequire adlsBlobSuffix and adlsDfsSuffix for Azure \n### CF\\_BUCKET\\_MISMATCH \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThe `` in the file event `` is different from expected by the source: ``. \n### CF\\_CANNOT\\_EVOLVE\\_SCHEMA\\_LOG\\_EMPTY \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot evolve schema when the schema log is empty. Schema log location: `` \n### CF\\_CANNOT\\_PARSE\\_QUEUE\\_MESSAGE \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot parse the following queue message: `` \n### CF\\_CANNOT\\_RESOLVE\\_CONTAINER\\_NAME \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot resolve container name from path: ``, Resolved uri: `` \n### CF\\_CANNOT\\_RUN\\_DIRECTORY\\_LISTING \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot run directory listing when there is an async backfill thread running \n### CF\\_CLEAN\\_SOURCE\\_ALLOW\\_OVERWRITES\\_BOTH\\_ON \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot turn on cloudFiles.cleanSource and cloudFiles.allowOverwrites at the same time. \n### CF\\_CLEAN\\_SOURCE\\_UNAUTHORIZED\\_WRITE\\_PERMISSION \n[SQLSTATE: 42501](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nAuto Loader cannot delete processed files because it does not have write permissions to the source directory. \n`` \nTo fix you can either: \n1. Grant write permissions to the source directory OR\n2. Set cleanSource to \u2018OFF\u2019 \nYou could also unblock your stream by setting the SQLConf spark.databricks.cloudFiles.cleanSource.disabledDueToAuthorizationErrors to \u2018true\u2019. \n### CF\\_DUPLICATE\\_COLUMN\\_IN\\_DATA \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThere was an error when trying to infer the partition schema of your table. You have the same column duplicated in your data and partition paths. To ignore the partition value, please provide your partition columns explicitly by using: .option(\u201ccloudFiles.``\u201d, \u201c{comma-separated-list}\u201d) \n### CF\\_EMPTY\\_DIR\\_FOR\\_SCHEMA\\_INFERENCE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot infer schema when the input path `` is empty. Please try to start the stream when there are files in the input path, or specify the schema. \n### CF\\_EVENT\\_GRID\\_AUTH\\_ERROR \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to create an Event Grid subscription. Please make sure that your service \nprincipal has `` Event Grid Subscriptions. See more details at: \n`` \n### CF\\_EVENT\\_GRID\\_CREATION\\_FAILED \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to create event grid subscription. Please ensure that Microsoft.EventGrid is \nregistered as resource provider in your subscription. See more details at: \n`` \n### CF\\_EVENT\\_GRID\\_NOT\\_FOUND\\_ERROR \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to create an Event Grid subscription. Please make sure that your storage \naccount (``) is under your resource group (``) and that \nthe storage account is a \u201cStorageV2 (general purpose v2)\u201d account. See more details at: \n`` \n### CF\\_EVENT\\_NOTIFICATION\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nAuto Loader event notification mode is not supported for ``. \n### CF\\_FAILED\\_TO\\_CHECK\\_STREAM\\_NEW \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to check if the stream is new \n### CF\\_FAILED\\_TO\\_CREATED\\_PUBSUB\\_SUBSCRIPTION \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to create subscription: ``. A subscription with the same name already exists and is associated with another topic: ``. The desired topic is ``. Either delete the existing subscription or create a subscription with a new resource suffix. \n### CF\\_FAILED\\_TO\\_CREATED\\_PUBSUB\\_TOPIC \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to create topic: ``. A topic with the same name already exists.`` Remove the existing topic or try again with another resource suffix \n### CF\\_FAILED\\_TO\\_DELETE\\_GCP\\_NOTIFICATION \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to delete notification with id `` on bucket `` for topic ``. Please retry or manually remove the notification through the GCP console. \n### CF\\_FAILED\\_TO\\_DESERIALIZE\\_PERSISTED\\_SCHEMA \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to deserialize persisted schema from string: \u2018``\u2019 \n### CF\\_FAILED\\_TO\\_EVOLVE\\_SCHEMA \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nCannot evolve schema without a schema log. \n### CF\\_FAILED\\_TO\\_FIND\\_PROVIDER \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to find provider for `` \n### CF\\_FAILED\\_TO\\_INFER\\_SCHEMA \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to infer schema for format `` from existing files in input path ``. Please ensure you configured the options properly or explicitly specify the schema. \n### CF\\_FAILED\\_TO\\_WRITE\\_TO\\_SCHEMA\\_LOG \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to write to the schema log at location ``. \n### CF\\_FILE\\_FORMAT\\_REQUIRED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCould not find required option: cloudFiles.format. \n### CF\\_FOUND\\_MULTIPLE\\_AUTOLOADER\\_PUBSUB\\_SUBSCRIPTIONS \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFound multiple (``) subscriptions with the Auto Loader prefix for topic ``: \n`` \nThere should only be one subscription per topic. Please manually ensure that your topic does not have multiple subscriptions. \n### CF\\_GCP\\_AUTHENTICATION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPlease either provide all of the following: ``, ``, \n``, and `` or provide none of them in order to use the default \nGCP credential provider chain for authenticating with GCP resources. \n### CF\\_GCP\\_LABELS\\_COUNT\\_EXCEEDED \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nReceived too many labels (``) for GCP resource. The maximum label count per resource is ``. \n### CF\\_GCP\\_RESOURCE\\_TAGS\\_COUNT\\_EXCEEDED \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nReceived too many resource tags (``) for GCP resource. The maximum resource tag count per resource is ``, as resource tags are stored as GCP labels on resources, and Databricks specific tags consume some of this label quota. \n### CF\\_INCOMPLETE\\_LOG\\_FILE\\_IN\\_SCHEMA\\_LOG \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nIncomplete log file in the schema log at path `` \n### CF\\_INCOMPLETE\\_METADATA\\_FILE\\_IN\\_CHECKPOINT \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nIncomplete metadata file in the Auto Loader checkpoint \n### CF\\_INCORRECT\\_SQL\\_PARAMS \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe cloud\\_files method accepts two required string parameters: the path to load from, and the file format. File reader options must be provided in a string key-value map. e.g. cloud\\_files(\u201cpath\u201d, \u201cjson\u201d, map(\u201coption1\u201d, \u201cvalue1\u201d)). Received: `` \n### [CF\\_INTERNAL\\_ERROR](https://docs.databricks.com/error-messages/cf-internal-error-error-class.html) \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInternal error. \nFor more details see [CF\\_INTERNAL\\_ERROR](https://docs.databricks.com/error-messages/cf-internal-error-error-class.html) \n### CF\\_INVALID\\_ARN \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid ARN: `` \n### CF\\_INVALID\\_CHECKPOINT \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThis checkpoint is not a valid CloudFiles source \n### CF\\_INVALID\\_CLEAN\\_SOURCE\\_MODE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid mode for clean source option ``. \n### CF\\_INVALID\\_GCP\\_RESOURCE\\_TAG\\_KEY \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid resource tag key for GCP resource: ``. Keys must start with a lowercase letter, be within 1 to 63 characters long, and contain only lowercase letters, numbers, underscores (\\_), and hyphens (-). \n### CF\\_INVALID\\_GCP\\_RESOURCE\\_TAG\\_VALUE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nInvalid resource tag value for GCP resource: ``. Values must be within 0 to 63 characters long and must contain only lowercase letters, numbers, underscores (\\_), and hyphens (-). \n### CF\\_INVALID\\_MANAGED\\_FILE\\_EVENTS\\_OPTION\\_KEYS \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nAuto Loader does not support the following options when used with managed file events: \n`` \nWe recommend that you remove these options and then restart the stream. \n### [CF\\_INVALID\\_MANAGED\\_FILE\\_EVENTS\\_RESPONSE](https://docs.databricks.com/error-messages/cf-invalid-managed-file-events-response-error-class.html) \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nInvalid response from managed file events service. Please contact Databricks support for assistance. \nFor more details see [CF\\_INVALID\\_MANAGED\\_FILE\\_EVENTS\\_RESPONSE](https://docs.databricks.com/error-messages/cf-invalid-managed-file-events-response-error-class.html) \n### CF\\_INVALID\\_SCHEMA\\_EVOLUTION\\_MODE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \ncloudFiles.`` must be one of { \n\u201c``\u201d \n\u201c``\u201d \n\u201c``\u201d \n\u201c``\u201d} \n### CF\\_INVALID\\_SCHEMA\\_HINTS\\_OPTION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nSchema hints can only specify a particular column once. \nIn this case, redefining column: `` \nmultiple times in schemaHints: \n`` \n### CF\\_INVALID\\_SCHEMA\\_HINT\\_COLUMN \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nSchema hints can not be used to override maps\u2019 and arrays\u2019 nested types. \nConflicted column: `` \n### CF\\_LATEST\\_OFFSET\\_READ\\_LIMIT\\_REQUIRED \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nlatestOffset should be called with a ReadLimit on this source. \n### CF\\_LOG\\_FILE\\_MALFORMED \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nLog file was malformed: failed to read correct log version from ``. \n### CF\\_MANAGED\\_FILE\\_EVENTS\\_BACKFILL\\_IN\\_PROGRESS \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nYou have requested Auto Loader to ignore existing files in your external location by setting includeExistingFiles to false. However, the managed file events service is still discovering existing files in your external location. Please try again after managed file events has completed discovering all files in your external location. \n### CF\\_MANAGED\\_FILE\\_EVENTS\\_ENDPOINT\\_NOT\\_FOUND \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nYou are using Auto Loader with managed file events, but it appears that the external location for your input path \u2018``\u2019 does not have file events enabled or the input path is invalid. Please request your Databricks Administrator to enable file events on the external location for your input path. \n### CF\\_MANAGED\\_FILE\\_EVENTS\\_ENDPOINT\\_PERMISSION\\_DENIED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nYou are using Auto Loader with managed file events, but you do not have access to the external location or volume for input path \u2018``\u2019 or the input path is invalid. Please request your Databricks Administrator to grant read permissions for the external location or volume or provide a valid input path within an existing external location or volume. \n### CF\\_MANAGED\\_FILE\\_EVENTS\\_ONLY\\_ON\\_SERVERLESS \n[SQLSTATE: 56038](https://docs.databricks.com/error-messages/sqlstates.html#class-56-miscellaneous-sql-or-product-error) \nAuto Loader with managed file events is only available on Databricks serverless. To continue, please move this workload to Databricks serverless or turn off the cloudFiles.useManagedFileEvents option. \n### CF\\_MAX\\_MUST\\_BE\\_POSITIVE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nmax must be positive \n### CF\\_METADATA\\_FILE\\_CONCURRENTLY\\_USED \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nMultiple streaming queries are concurrently using `` \n### CF\\_MISSING\\_METADATA\\_FILE\\_ERROR \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe metadata file in the streaming source checkpoint directory is missing. This metadata \nfile contains important default options for the stream, so the stream cannot be restarted \nright now. Please contact Databricks support for assistance. \n### CF\\_MISSING\\_PARTITION\\_COLUMN\\_ERROR \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPartition column `` does not exist in the provided schema: \n`` \n### CF\\_MISSING\\_SCHEMA\\_IN\\_PATHLESS\\_MODE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPlease specify a schema using .schema() if a path is not provided to the CloudFiles source while using file notification mode. Alternatively, to have Auto Loader to infer the schema please provide a base path in .load(). \n### CF\\_MULTIPLE\\_PUBSUB\\_NOTIFICATIONS\\_FOR\\_TOPIC \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFound existing notifications for topic `` on bucket ``: \nnotification,id \n`` \nTo avoid polluting the subscriber with unintended events, please delete the above notifications and retry. \n### CF\\_NEW\\_PARTITION\\_ERROR \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nNew partition columns were inferred from your files: [``]. Please provide all partition columns in your schema or provide a list of partition columns which you would like to extract values for by using: .option(\u201ccloudFiles.partitionColumns\u201d, \u201c{comma-separated-list|empty-string}\u201d) \n### CF\\_PARTITON\\_INFERENCE\\_ERROR \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nThere was an error when trying to infer the partition schema of the current batch of files. Please provide your partition columns explicitly by using: .option(\u201ccloudFiles.``\u201d, \u201c{comma-separated-list}\u201d) \n### CF\\_PATH\\_DOES\\_NOT\\_EXIST\\_FOR\\_READ\\_FILES \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCannot read files when the input path `` does not exist. Please make sure the input path exists and re-try. \n### CF\\_PERIODIC\\_BACKFILL\\_NOT\\_SUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nPeriodic backfill is not supported if asynchronous backfill is disabled. You can enable asynchronous backfill/directory listing by setting `spark.databricks.cloudFiles.asyncDirListing` to true \n### CF\\_PREFIX\\_MISMATCH \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFound mismatched event: key `` doesn\u2019t have the prefix: `` \n### CF\\_PROTOCOL\\_MISMATCH \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n`` \nIf you don\u2019t need to make any other changes to your code, then please set the SQL \nconfiguration: \u2018`` = ``\u2019 \nto resume your stream. Please refer to: \n`` \nfor more details. \n### CF\\_REGION\\_NOT\\_FOUND\\_ERROR \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nCould not get default AWS Region. Please specify a region using the cloudFiles.region option. \n### CF\\_RESOURCE\\_SUFFIX\\_EMPTY \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to create notification services: the resource suffix cannot be empty. \n### CF\\_RESOURCE\\_SUFFIX\\_INVALID\\_CHAR\\_AWS \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to create notification services: the resource suffix can only have alphanumeric characters, hyphens (-) and underscores (\\_). \n### CF\\_RESOURCE\\_SUFFIX\\_INVALID\\_CHAR\\_AZURE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to create notification services: the resource suffix can only have lowercase letter, number, and dash (-). \n### CF\\_RESOURCE\\_SUFFIX\\_INVALID\\_CHAR\\_GCP \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to create notification services: the resource suffix can only have alphanumeric characters, hyphens (-), underscores (\\_), periods (.), tildes (~) plus signs (+), and percent signs (``). \n### CF\\_RESOURCE\\_SUFFIX\\_LIMIT \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to create notification services: the resource suffix cannot have more than `` characters. \n### CF\\_RESOURCE\\_SUFFIX\\_LIMIT\\_GCP \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFailed to create notification services: the resource suffix must be between `` and `` characters. \n### CF\\_RESTRICTED\\_GCP\\_RESOURCE\\_TAG\\_KEY \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFound restricted GCP resource tag key (``). The following GCP resource tag keys are restricted for Auto Loader: [``] \n### CF\\_RETENTION\\_GREATER\\_THAN\\_MAX\\_FILE\\_AGE \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \ncloudFiles.cleanSource.retentionDuration cannot be greater than cloudFiles.maxFileAge. \n### CF\\_SAME\\_PUB\\_SUB\\_TOPIC\\_NEW\\_KEY\\_PREFIX \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nFailed to create notification for topic: `` with prefix: ``. There is already a topic with the same name with another prefix: ``. Try using a different resource suffix for setup or delete the existing setup. \n### CF\\_SOURCE\\_DIRECTORY\\_PATH\\_REQUIRED \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nPlease provide the source directory path with option `path` \n### CF\\_SOURCE\\_UNSUPPORTED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe cloud files source only supports S3, Azure Blob Storage (wasb/wasbs) and Azure Data Lake Gen1 (adl) and Gen2 (abfs/abfss) paths right now. path: \u2018``\u2019, resolved uri: \u2018``\u2019 \n### CF\\_STATEFUL\\_STREAMING\\_SCHEMA\\_EVOLUTION \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nStateful streaming queries do not support schema evolution. Please set the option \u201ccloudFiles.schemaEvolutionMode\u201d to \u201crescue\u201d or \u201cnone\u201d. \n### CF\\_STATE\\_INCORRECT\\_SQL\\_PARAMS \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe cloud\\_files\\_state function accepts a string parameter representing the checkpoint directory of a cloudFiles stream or a multi-part tableName identifying a streaming table, and an optional second integer parameter representing the checkpoint version to load state for. The second parameter may also be \u2018latest\u2019 to read the latest checkpoint. Received: `` \n### CF\\_STATE\\_INVALID\\_CHECKPOINT\\_PATH \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe input checkpoint path `` is invalid. Either the path does not exist or there are no cloud\\_files sources found. \n### CF\\_STATE\\_INVALID\\_VERSION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nThe specified version `` does not exist, or was removed during analysis. \n### CF\\_THREAD\\_IS\\_DEAD \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n`` thread is dead. \n### CF\\_UNABLE\\_TO\\_DERIVE\\_STREAM\\_CHECKPOINT\\_LOCATION \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to derive the stream checkpoint location from the source checkpoint location: `` \n### CF\\_UNABLE\\_TO\\_DETECT\\_FILE\\_FORMAT \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to detect the source file format from `` sampled file(s), found ``. Please specify the format. \n### CF\\_UNABLE\\_TO\\_EXTRACT\\_BUCKET\\_INFO \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to extract bucket information. Path: \u2018``\u2019, resolved uri: \u2018``\u2019. \n### CF\\_UNABLE\\_TO\\_EXTRACT\\_KEY\\_INFO \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to extract key information. Path: \u2018``\u2019, resolved uri: \u2018``\u2019. \n### CF\\_UNABLE\\_TO\\_EXTRACT\\_STORAGE\\_ACCOUNT\\_INFO \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nUnable to extract storage account information; path: \u2018``\u2019, resolved uri: \u2018``\u2019 \n### CF\\_UNABLE\\_TO\\_LIST\\_EFFICIENTLY \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nReceived a directory rename event for the path ``, but we are unable to list this directory efficiently. In order for the stream to continue, set the option \u2018cloudFiles.ignoreDirRenames\u2019 to true, and consider enabling regular backfills with cloudFiles.backfillInterval for this data to be processed. \n### CF\\_UNEXPECTED\\_READ\\_LIMIT \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nUnexpected ReadLimit: `` \n### CF\\_UNKNOWN\\_OPTION\\_KEYS\\_ERROR \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nFound unknown option keys: \n`` \nPlease make sure that all provided option keys are correct. If you want to skip the \nvalidation of your options and ignore these unknown options, you can set: \n.option(\u201ccloudFiles.``\u201d, \u201cfalse\u201d) \n### CF\\_UNKNOWN\\_READ\\_LIMIT \n[SQLSTATE: 22000](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nUnknown ReadLimit: `` \n### CF\\_UNSUPPORTED\\_CLOUD\\_FILES\\_SQL\\_FUNCTION \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe SQL function \u2018cloud\\_files\u2019 to create an Auto Loader streaming source is supported only in a Delta Live Tables pipeline. See more details at: \n`` \n### CF\\_UNSUPPORTED\\_FORMAT\\_FOR\\_SCHEMA\\_INFERENCE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nSchema inference is not supported for format: ``. Please specify the schema. \n### CF\\_UNSUPPORTED\\_LOG\\_VERSION \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nUnsupportedLogVersion: maximum supported log version is v``, but encountered v``. The log file was produced by a newer version of DBR and cannot be read by this version. Please upgrade. \n### CF\\_UNSUPPORTED\\_SCHEMA\\_EVOLUTION\\_MODE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nSchema evolution mode `` is not supported for format: ``. Please set the schema evolution mode to \u2018none\u2019. \n### CF\\_USE\\_DELTA\\_FORMAT \n[SQLSTATE: 42000](https://docs.databricks.com/error-messages/sqlstates.html#class-42-syntax-error-or-access-rule-violation) \nReading from a Delta table is not supported with this syntax. If you would like to consume data from Delta, please refer to the docs: read a Delta table (``), or read a Delta table as a stream source (``). The streaming source from Delta is already optimized for incremental consumption of data.\n\n", "chunk_id": "d603d4a61ef4d32a067678a215b6f2f5", "url": "https://docs.databricks.com/error-messages/error-classes.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### Error classes in Databricks\n##### Geospatial\n\n### EWKB\\_PARSE\\_ERROR \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nError parsing EWKB: `` at position `` \n### [GEOJSON\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/geojson-parse-error-error-class.html) \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nError parsing GeoJSON: `` at position `` \nFor more details see [GEOJSON\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/geojson-parse-error-error-class.html) \n### [H3\\_INVALID\\_CELL\\_ID](https://docs.databricks.com/error-messages/h3-invalid-cell-id-error-class.html) \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n`` is not a valid H3 cell ID \nFor more details see [H3\\_INVALID\\_CELL\\_ID](https://docs.databricks.com/error-messages/h3-invalid-cell-id-error-class.html) \n### [H3\\_INVALID\\_GRID\\_DISTANCE\\_VALUE](https://docs.databricks.com/error-messages/h3-invalid-grid-distance-value-error-class.html) \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nH3 grid distance `` must be non-negative \nFor more details see [H3\\_INVALID\\_GRID\\_DISTANCE\\_VALUE](https://docs.databricks.com/error-messages/h3-invalid-grid-distance-value-error-class.html) \n### [H3\\_INVALID\\_RESOLUTION\\_VALUE](https://docs.databricks.com/error-messages/h3-invalid-resolution-value-error-class.html) \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nH3 resolution `` must be between `` and ``, inclusive \nFor more details see [H3\\_INVALID\\_RESOLUTION\\_VALUE](https://docs.databricks.com/error-messages/h3-invalid-resolution-value-error-class.html) \n### [H3\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/h3-not-enabled-error-class.html) \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is disabled or unsupported. Consider enabling Photon or switch to a tier that supports H3 expressions \nFor more details see [H3\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/h3-not-enabled-error-class.html) \n### H3\\_PENTAGON\\_ENCOUNTERED\\_ERROR \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nA pentagon was encountered while computing the hex ring of `` with grid distance `` \n### H3\\_UNDEFINED\\_GRID\\_DISTANCE \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nH3 grid distance between `` and `` is undefined \n### ST\\_DIFFERENT\\_SRID\\_VALUES \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nArguments to \u201c``\u201d must have the same SRID value. SRID values found: ``, `` \n### ST\\_INVALID\\_ARGUMENT \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n\u201c``\u201d: `` \n### ST\\_INVALID\\_ARGUMENT\\_TYPE \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nArgument to \u201c``\u201d must be of type `` \n### ST\\_INVALID\\_CRS\\_TRANSFORMATION\\_ERROR \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n``: Invalid or unsupported CRS transformation from SRID `` to SRID `` \n### ST\\_INVALID\\_ENDIANNESS\\_VALUE \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nEndianness `` must be be \u2018NDR\u2019 (little-endian) or \u2018XDR\u2019 (big-endian) \n### ST\\_INVALID\\_GEOHASH\\_VALUE \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \n``: Invalid geohash value: \u2018``\u2019. Geohash values must be valid lowercase base32 strings as described in \n### ST\\_INVALID\\_PRECISION\\_VALUE \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nPrecision `), please use `%fs ls `\n\n# [[1]]\n# [[1]]$path\n# [1] \"/Volumes/main/default/my-volume/data.csv\"\n\n# [[1]]$name\n# [1] \"data.csv\"\n\n# [[1]]$size\n# [1] 2258987\n\n# [[1]]$isDir\n# [1] FALSE\n\n# [[1]]$isFile\n# [1] TRUE\n\n# [[1]]$modificationTime\n# [1] \"1711357839000\"\n\n``` \n```\ndbutils.fs.ls(\"/tmp\")\n\n// res6: Seq[com.databricks.backend.daemon.dbutils.FileInfo] = WrappedArray(FileInfo(/Volumes/main/default/my-volume/data.csv, 2258987, 1711357839000))\n\n``` \n### mkdirs command (dbutils.fs.mkdirs) \nCreates the given directory if it does not exist. Also creates any necessary parent directories. \nTo display help for this command, run `dbutils.fs.help(\"mkdirs\")`. \nThis example creates the directory `my-data` within `/Volumes/main/default/my-volume/`. \n```\ndbutils.fs.mkdirs(\"/Volumes/main/default/my-volume/my-data\")\n\n# Out[15]: True\n\n``` \n```\ndbutils.fs.mkdirs(\"/Volumes/main/default/my-volume/my-data\")\n\n# [1] TRUE\n\n``` \n```\ndbutils.fs.mkdirs(\"/Volumes/main/default/my-volume/my-data\")\n\n// res7: Boolean = true\n\n``` \n### mount command (dbutils.fs.mount) \nMounts the specified source directory into DBFS at the specified mount point. \nTo display help for this command, run `dbutils.fs.help(\"mount\")`. \n```\naws_bucket_name = \"my-bucket\"\nmount_name = \"s3-my-bucket\"\n\ndbutils.fs.mount(\"s3a://%s\" % aws_bucket_name, \"/mnt/%s\" % mount_name)\n\n``` \n```\nval AwsBucketName = \"my-bucket\"\nval MountName = \"s3-my-bucket\"\n\ndbutils.fs.mount(s\"s3a://$AwsBucketName\", s\"/mnt/$MountName\")\n\n``` \nFor additional code examples, see [Connect to Amazon S3](https://docs.databricks.com/connect/storage/amazon-s3.html). \n### mounts command (dbutils.fs.mounts) \nDisplays information about what is currently mounted within DBFS. \nTo display help for this command, run `dbutils.fs.help(\"mounts\")`. \nWarning \nCall `dbutils.fs.refreshMounts()` on all other running clusters to propagate the new mount. See [refreshMounts command (dbutils.fs.refreshMounts)](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-refreshmounts). \n```\ndbutils.fs.mounts()\n\n# Out[11]: [MountInfo(mountPoint='/mnt/databricks-results', source='databricks-results', encryptionType='sse-s3')]\n\n``` \n```\ndbutils.fs.mounts()\n\n``` \nFor additional code examples, see [Connect to Amazon S3](https://docs.databricks.com/connect/storage/amazon-s3.html). \n### mv command (dbutils.fs.mv) \nMoves a file or directory, possibly across filesystems. A move is a copy followed by a delete, even for moves within filesystems. \nTo display help for this command, run `dbutils.fs.help(\"mv\")`. \nThis example moves the file `rows.csv` from `/Volumes/main/default/my-volume/` to `/Volumes/main/default/my-volume/my-data/`. \n```\ndbutils.fs.mv(\"/Volumes/main/default/my-volume/rows.csv\", \"/Volumes/main/default/my-volume/my-data/\")\n\n# Out[2]: True\n\n``` \n```\ndbutils.fs.mv(\"/Volumes/main/default/my-volume/rows.csv\", \"/Volumes/main/default/my-volume/my-data/\")\n\n# [1] TRUE\n\n``` \n```\ndbutils.fs.mv(\"/Volumes/main/default/my-volume/rows.csv\", \"/Volumes/main/default/my-volume/my-data/\")\n\n// res1: Boolean = true\n\n``` \n### put command (dbutils.fs.put) \nWrites the specified string to a file. The string is UTF-8 encoded. \nTo display help for this command, run `dbutils.fs.help(\"put\")`. \nThis example writes the string `Hello, Databricks!` to a file named `hello.txt` in `/Volumes/main/default/my-volume/`. If the file exists, it will be overwritten. \n```\ndbutils.fs.put(\"/Volumes/main/default/my-volume/hello.txt\", \"Hello, Databricks!\", True)\n\n# Wrote 2258987 bytes.\n# Out[6]: True\n\n``` \n```\ndbutils.fs.put(\"/Volumes/main/default/my-volume/hello.txt\", \"Hello, Databricks!\", TRUE)\n\n# [1] TRUE\n\n``` \n```\ndbutils.fs.put(\"/Volumes/main/default/my-volume/hello.txt\", \"Hello, Databricks!\", true)\n\n// Wrote 2258987 bytes.\n// res2: Boolean = true\n\n``` \n### refreshMounts command (dbutils.fs.refreshMounts) \nForces all machines in the cluster to refresh their mount cache, ensuring they receive the most recent information. \nTo display help for this command, run `dbutils.fs.help(\"refreshMounts\")`. \n```\ndbutils.fs.refreshMounts()\n\n``` \n```\ndbutils.fs.refreshMounts()\n\n``` \nFor additional code examples, see [Connect to Amazon S3](https://docs.databricks.com/connect/storage/amazon-s3.html). \n### rm command (dbutils.fs.rm) \nRemoves a file or directory and optionally all of its contents. If a file is specified, the recurse parameter is ignored. If a directory is specified, an error occurs if recurse is disabled and the directory is not empty. \nTo display help for this command, run `dbutils.fs.help(\"rm\")`. \nThis example removes the directory `/Volumes/main/default/my-volume/my-data/` including the contents of the directory. \n```\ndbutils.fs.rm(\"/Volumes/main/default/my-volume/my-data/\", True)\n\n# Out[8]: True\n\n``` \n```\ndbutils.fs.rm(\"/Volumes/main/default/my-volume/my-data/\", TRUE)\n\n# [1] TRUE\n\n``` \n```\ndbutils.fs.rm(\"/Volumes/main/default/my-volume/my-data/\", true)\n\n// res6: Boolean = true\n\n``` \n### unmount command (dbutils.fs.unmount) \nDeletes a DBFS mount point. \nWarning \nTo avoid errors, never modify a mount point while other jobs are reading or writing to it. After modifying a mount, always run `dbutils.fs.refreshMounts()` on all other running clusters to propagate any mount updates. See [refreshMounts command (dbutils.fs.refreshMounts)](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-refreshmounts). \nTo display help for this command, run `dbutils.fs.help(\"unmount\")`. \n```\ndbutils.fs.unmount(\"/mnt/\")\n\n``` \nFor additional code examples, see [Connect to Amazon S3](https://docs.databricks.com/connect/storage/amazon-s3.html). \n### updateMount command (dbutils.fs.updateMount) \nSimilar to the `dbutils.fs.mount` command, but updates an existing mount point instead of creating a new one. Returns an error if the mount point is not present. \nTo display help for this command, run `dbutils.fs.help(\"updateMount\")`. \nWarning \nTo avoid errors, never modify a mount point while other jobs are reading or writing to it. After modifying a mount, always run `dbutils.fs.refreshMounts()` on all other running clusters to propagate any mount updates. See [refreshMounts command (dbutils.fs.refreshMounts)](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-refreshmounts). \nThis command is available in Databricks Runtime 10.4 LTS and above. \n```\naws_bucket_name = \"my-bucket\"\nmount_name = \"s3-my-bucket\"\n\ndbutils.fs.updateMount(\"s3a://%s\" % aws_bucket_name, \"/mnt/%s\" % mount_name)\n\n``` \n```\nval AwsBucketName = \"my-bucket\"\nval MountName = \"s3-my-bucket\"\n\ndbutils.fs.updateMount(s\"s3a://$AwsBucketName\", s\"/mnt/$MountName\")\n\n```\n\n", "chunk_id": "5de32a1c4d8e666a6be96732608e8df2", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Jobs utility (dbutils.jobs)\n\n**Subutilities**: [taskValues](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-jobs-taskvalues) \nNote \nThis utility is available only for Python. \nThe jobs utility allows you to leverage jobs features. To display help for this utility, run `dbutils.jobs.help()`. \n```\nProvides utilities for leveraging jobs features.\n\ntaskValues: TaskValuesUtils -> Provides utilities for leveraging job task values\n\n``` \n### taskValues subutility (dbutils.jobs.taskValues) \n**Commands**: [get](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-jobs-taskvalues-get), [set](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-jobs-taskvalues-set) \nNote \nThis subutility is available only for Python. \nProvides commands for leveraging job task values. \nUse this sub utility to set and get arbitrary values during a job run. These values are called *task values*. You can access task values in downstream tasks in the same job run. For example, you can communicate identifiers or metrics, such as information about the evaluation of a machine learning model, between different tasks within a job run. Each task can set multiple task values, get them, or both. Each task value has a unique key within the same task. This unique key is known as the task value\u2019s key. A task value is accessed with the task name and the task value\u2019s key. \nTo display help for this subutility, run `dbutils.jobs.taskValues.help()`. \n#### get command (dbutils.jobs.taskValues.get) \nNote \nThis command is available only for Python. \nOn Databricks Runtime 10.4 and earlier, if `get` cannot find the task, a [Py4JJavaError](https://www.py4j.org/py4j_java_protocol.html#py4jjavaerror) is raised instead of a `ValueError`. \nGets the contents of the specified task value for the specified task in the current job run. \nTo display help for this command, run `dbutils.jobs.taskValues.help(\"get\")`. \nFor example: \n```\ndbutils.jobs.taskValues.get(taskKey = \"my-task\", \\\nkey = \"my-key\", \\\ndefault = 7, \\\ndebugValue = 42)\n\n``` \nIn the preceding example: \n* `taskKey` is the name of the task that set the task value. If the command cannot find this task, a `ValueError` is raised.\n* `key` is the name of the task value\u2019s key that you set with the [set command (dbutils.jobs.taskValues.set)](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-jobs-taskvalues-set). If the command cannot find this task value\u2019s key, a `ValueError` is raised (unless `default` is specified).\n* `default` is an optional value that is returned if `key` cannot be found. `default` cannot be `None`.\n* `debugValue` is an optional value that is returned if you try to get the task value from within a notebook that is running outside of a job. This can be useful during debugging when you want to run your notebook manually and return some value instead of raising a `TypeError` by default. `debugValue` cannot be `None`. \nIf you try to get a task value from within a notebook that is running outside of a job, this command raises a `TypeError` by default. However, if the `debugValue` argument is specified in the command, the value of `debugValue` is returned instead of raising a `TypeError`. \n#### set command (dbutils.jobs.taskValues.set) \nNote \nThis command is available only for Python. \nSets or updates a task value. You can set up to 250 task values for a job run. \nTo display help for this command, run `dbutils.jobs.taskValues.help(\"set\")`. \nSome examples include: \n```\ndbutils.jobs.taskValues.set(key = \"my-key\", \\\nvalue = 5)\n\ndbutils.jobs.taskValues.set(key = \"my-other-key\", \\\nvalue = \"my other value\")\n\n``` \nIn the preceding examples: \n* `key` is the task value\u2019s key. This key must be unique to the task. That is, if two different tasks each set a task value with key `K`, these are two different task values that have the same key `K`.\n* `value` is the value for this task value\u2019s key. This command must be able to represent the value internally in JSON format. The size of the JSON representation of the value cannot exceed 48 KiB. \nIf you try to set a task value from within a notebook that is running outside of a job, this command does nothing.\n\n", "chunk_id": "d0c596fad351e34473dccdd988ac4b3c", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Library utility (dbutils.library)\n\nMost methods in the `dbutils.library` submodule are deprecated. See [Library utility (dbutils.library) (legacy)](https://docs.databricks.com/archive/dev-tools/dbutils-library.html). \nYou might need to programmatically restart the Python process on Databricks to ensure that locally installed or upgraded libraries function correctly in the Python kernel for your current SparkSession. To do this, run the `dbutils.library.restartPython` command. See [Restart the Python process on Databricks](https://docs.databricks.com/libraries/restart-python-process.html).\n\n", "chunk_id": "188bd5bb7c07ec29e4c6b2e6c756dd81", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Notebook utility (dbutils.notebook)\n\n**Commands**: [exit](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-workflow-exit), [run](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-workflow-run) \nThe notebook utility allows you to chain together notebooks and act on their results. See [Run a Databricks notebook from another notebook](https://docs.databricks.com/notebooks/notebook-workflows.html). \nTo list the available commands, run `dbutils.notebook.help()`. \n```\nexit(value: String): void -> This method lets you exit a notebook with a value\nrun(path: String, timeoutSeconds: int, arguments: Map): String -> This method runs a notebook and returns its exit value.\n\n``` \n### exit command (dbutils.notebook.exit) \nExits a notebook with a value. \nTo display help for this command, run `dbutils.notebook.help(\"exit\")`. \nThis example exits the notebook with the value `Exiting from My Other Notebook`. \n```\ndbutils.notebook.exit(\"Exiting from My Other Notebook\")\n\n# Notebook exited: Exiting from My Other Notebook\n\n``` \n```\ndbutils.notebook.exit(\"Exiting from My Other Notebook\")\n\n# Notebook exited: Exiting from My Other Notebook\n\n``` \n```\ndbutils.notebook.exit(\"Exiting from My Other Notebook\")\n\n// Notebook exited: Exiting from My Other Notebook\n\n``` \nNote \nIf the run has a query with [structured streaming](https://docs.databricks.com/structured-streaming/production.html) running in the background, calling `dbutils.notebook.exit()` does not terminate the run. The run will continue to execute for as long as query is executing in the background. You can stop the query running in the background by clicking **Cancel** in the cell of the query or by running `query.stop()`. When the query stops, you can terminate the run with `dbutils.notebook.exit()`. \n### run command (dbutils.notebook.run) \nRuns a notebook and returns its exit value. The notebook will run in the current cluster by default. \nNote \nThe maximum length of the string value returned from the `run` command is 5 MB. See [Get the output for a single run](https://docs.databricks.com/api/workspace/jobs) (`GET /jobs/runs/get-output`). \nTo display help for this command, run `dbutils.notebook.help(\"run\")`. \nThis example runs a notebook named `My Other Notebook` in the same location as the calling notebook. The called notebook ends with the line of code `dbutils.notebook.exit(\"Exiting from My Other Notebook\")`. If the called notebook does not finish running within 60 seconds, an exception is thrown. \n```\ndbutils.notebook.run(\"My Other Notebook\", 60)\n\n# Out[14]: 'Exiting from My Other Notebook'\n\n``` \n```\ndbutils.notebook.run(\"My Other Notebook\", 60)\n\n// res2: String = Exiting from My Other Notebook\n\n```\n\n", "chunk_id": "25769b563d2b14ad64c070673c912087", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Secrets utility (dbutils.secrets)\n\n**Commands**: [get](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-secrets-get), [getBytes](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-secrets-getbytes), [list](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-secrets-list), [listScopes](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-secrets-listscopes) \nThe secrets utility allows you to store and access sensitive credential information without making them visible in notebooks. See [Secret management](https://docs.databricks.com/security/secrets/index.html) and [Use the secrets in a notebook](https://docs.databricks.com/security/secrets/example-secret-workflow.html#secret-example-notebook). To list the available commands, run `dbutils.secrets.help()`. \n```\nget(scope: String, key: String): String -> Gets the string representation of a secret value with scope and key\ngetBytes(scope: String, key: String): byte[] -> Gets the bytes representation of a secret value with scope and key\nlist(scope: String): Seq -> Lists secret metadata for secrets within a scope\nlistScopes: Seq -> Lists secret scopes\n\n``` \n### get command (dbutils.secrets.get) \nGets the string representation of a secret value for the specified secrets scope and key. \nWarning \nAdministrators, secret creators, and users granted [permission](https://docs.databricks.com/security/secrets/secrets.html#permissions) can read Databricks secrets. While\nDatabricks makes an effort to redact secret values that might be displayed in notebooks, it is not possible to prevent such users from reading secrets. For more information, see [Secret redaction](https://docs.databricks.com/security/secrets/redaction.html). \nTo display help for this command, run `dbutils.secrets.help(\"get\")`. \nThis example gets the string representation of the secret value for the scope named `my-scope` and the key named `my-key`. \n```\ndbutils.secrets.get(scope=\"my-scope\", key=\"my-key\")\n\n# Out[14]: '[REDACTED]'\n\n``` \n```\ndbutils.secrets.get(scope=\"my-scope\", key=\"my-key\")\n\n# [1] \"[REDACTED]\"\n\n``` \n```\ndbutils.secrets.get(scope=\"my-scope\", key=\"my-key\")\n\n// res0: String = [REDACTED]\n\n``` \n### getBytes command (dbutils.secrets.getBytes) \nGets the bytes representation of a secret value for the specified scope and key. \nTo display help for this command, run `dbutils.secrets.help(\"getBytes\")`. \nThis example gets the byte representation of the secret value (in this example, `a1!b2@c3#`) for the scope named `my-scope` and the key named `my-key`. \n```\ndbutils.secrets.getBytes(scope=\"my-scope\", key=\"my-key\")\n\n# Out[1]: b'a1!b2@c3#'\n\n``` \n```\ndbutils.secrets.getBytes(scope=\"my-scope\", key=\"my-key\")\n\n# [1] 61 31 21 62 32 40 63 33 23\n\n``` \n```\ndbutils.secrets.getBytes(scope=\"my-scope\", key=\"my-key\")\n\n// res1: Array[Byte] = Array(97, 49, 33, 98, 50, 64, 99, 51, 35)\n\n``` \n### list command (dbutils.secrets.list) \nLists the metadata for secrets within the specified scope. \nTo display help for this command, run `dbutils.secrets.help(\"list\")`. \nThis example lists the metadata for secrets within the scope named `my-scope`. \n```\ndbutils.secrets.list(\"my-scope\")\n\n# Out[10]: [SecretMetadata(key='my-key')]\n\n``` \n```\ndbutils.secrets.list(\"my-scope\")\n\n# [[1]]\n# [[1]]$key\n# [1] \"my-key\"\n\n``` \n```\ndbutils.secrets.list(\"my-scope\")\n\n// res2: Seq[com.databricks.dbutils_v1.SecretMetadata] = ArrayBuffer(SecretMetadata(my-key))\n\n``` \n### listScopes command (dbutils.secrets.listScopes) \nLists the available scopes. \nTo display help for this command, run `dbutils.secrets.help(\"listScopes\")`. \nThis example lists the available scopes. \n```\ndbutils.secrets.listScopes()\n\n# Out[14]: [SecretScope(name='my-scope')]\n\n``` \n```\ndbutils.secrets.listScopes()\n\n# [[1]]\n# [[1]]$name\n# [1] \"my-scope\"\n\n``` \n```\ndbutils.secrets.listScopes()\n\n// res3: Seq[com.databricks.dbutils_v1.SecretScope] = ArrayBuffer(SecretScope(my-scope))\n\n```\n\n", "chunk_id": "6eb9e8d4535aea74245f3f83980e5338", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Widgets utility (dbutils.widgets)\n\n**Commands**: [combobox](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets-combobox), [dropdown](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets-dropdown), [get](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets-get), [getArgument](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets-getargument), [multiselect](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets-multiselect), [remove](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets-remove), [removeAll](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets-removeall), [text](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets-text) \nThe widgets utility allows you to parameterize notebooks. See [Databricks widgets](https://docs.databricks.com/notebooks/widgets.html). \nNote \nNotebook SQL magics like `CREATE WIDGET` and `REMOVE WIDGET` provide convenient ways to work with widgets in SQL notebook cells. \nTo list the available commands, run `dbutils.widgets.help()`. \n```\ncombobox(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a combobox input widget with a given name, default value and choices\ndropdown(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a dropdown input widget a with given name, default value and choices\nget(name: String): String -> Retrieves current value of an input widget\ngetAll: map -> Retrieves a map of all widget names and their values\ngetArgument(name: String, optional: String): String -> (DEPRECATED) Equivalent to get\nmultiselect(name: String, defaultValue: String, choices: Seq, label: String): void -> Creates a multiselect input widget with a given name, default value and choices\nremove(name: String): void -> Removes an input widget from the notebook\nremoveAll: void -> Removes all widgets in the notebook\ntext(name: String, defaultValue: String, label: String): void -> Creates a text input widget with a given name and default value\n\n``` \n### combobox command (dbutils.widgets.combobox) \nCreates and displays a combobox widget with the specified programmatic name, default value, choices, and optional label. \nTo display help for this command, run `dbutils.widgets.help(\"combobox\")`. \nThis example creates and displays a combobox widget with the programmatic name `fruits_combobox`. It offers the choices `apple`, `banana`, `coconut`, and `dragon fruit` and is set to the initial value of `banana`. This combobox widget has an accompanying label `Fruits`. This example ends by printing the initial value of the combobox widget, `banana`. \n```\ndbutils.widgets.combobox(\nname='fruits_combobox',\ndefaultValue='banana',\nchoices=['apple', 'banana', 'coconut', 'dragon fruit'],\nlabel='Fruits'\n)\n\nprint(dbutils.widgets.get(\"fruits_combobox\"))\n\n# banana\n\n``` \n```\ndbutils.widgets.combobox(\nname='fruits_combobox',\ndefaultValue='banana',\nchoices=list('apple', 'banana', 'coconut', 'dragon fruit'),\nlabel='Fruits'\n)\n\nprint(dbutils.widgets.get(\"fruits_combobox\"))\n\n# [1] \"banana\"\n\n``` \n```\ndbutils.widgets.combobox(\n\"fruits_combobox\",\n\"banana\",\nArray(\"apple\", \"banana\", \"coconut\", \"dragon fruit\"),\n\"Fruits\"\n)\n\nprint(dbutils.widgets.get(\"fruits_combobox\"))\n\n// banana\n\n``` \n```\nCREATE WIDGET COMBOBOX fruits_combobox DEFAULT \"banana\" CHOICES SELECT * FROM (VALUES (\"apple\"), (\"banana\"), (\"coconut\"), (\"dragon fruit\"))\n\nSELECT :fruits_combobox\n\n-- banana\n\n``` \n### dropdown command (dbutils.widgets.dropdown) \nCreates and displays a dropdown widget with the specified programmatic name, default value, choices, and optional label. \nTo display help for this command, run `dbutils.widgets.help(\"dropdown\")`. \nThis example creates and displays a dropdown widget with the programmatic name `toys_dropdown`. It offers the choices `alphabet blocks`, `basketball`, `cape`, and `doll` and is set to the initial value of `basketball`. This dropdown widget has an accompanying label `Toys`. This example ends by printing the initial value of the dropdown widget, `basketball`. \n```\ndbutils.widgets.dropdown(\nname='toys_dropdown',\ndefaultValue='basketball',\nchoices=['alphabet blocks', 'basketball', 'cape', 'doll'],\nlabel='Toys'\n)\n\nprint(dbutils.widgets.get(\"toys_dropdown\"))\n\n# basketball\n\n``` \n```\ndbutils.widgets.dropdown(\nname='toys_dropdown',\ndefaultValue='basketball',\nchoices=list('alphabet blocks', 'basketball', 'cape', 'doll'),\nlabel='Toys'\n)\n\nprint(dbutils.widgets.get(\"toys_dropdown\"))\n\n# [1] \"basketball\"\n\n``` \n```\ndbutils.widgets.dropdown(\n\"toys_dropdown\",\n\"basketball\",\nArray(\"alphabet blocks\", \"basketball\", \"cape\", \"doll\"),\n\"Toys\"\n)\n\nprint(dbutils.widgets.get(\"toys_dropdown\"))\n\n// basketball\n\n``` \n```\nCREATE WIDGET DROPDOWN toys_dropdown DEFAULT \"basketball\" CHOICES SELECT * FROM (VALUES (\"alphabet blocks\"), (\"basketball\"), (\"cape\"), (\"doll\"))\n\nSELECT :toys_dropdown\n\n-- basketball\n\n``` \n### get command (dbutils.widgets.get) \nGets the current value of the widget with the specified programmatic name. This programmatic name can be either: \n* The name of a custom widget in the notebook, for example `fruits_combobox` or `toys_dropdown`.\n* The name of a custom parameter passed to the notebook as part of a notebook task, for example `name` or `age`. For more information, see the coverage of parameters for notebook tasks in the [Create a job](https://docs.databricks.com/workflows/jobs/create-run-jobs.html#create-a-job) UI or the `notebook_params` field in the [Trigger a new job run](https://docs.databricks.com/api/workspace/jobs) (`POST /jobs/run-now`) operation in the Jobs API. \nTo display help for this command, run `dbutils.widgets.help(\"get\")`. \nThis example gets the value of the widget that has the programmatic name `fruits_combobox`. \n```\ndbutils.widgets.get('fruits_combobox')\n\n# banana\n\n``` \n```\ndbutils.widgets.get('fruits_combobox')\n\n# [1] \"banana\"\n\n``` \n```\ndbutils.widgets.get(\"fruits_combobox\")\n\n// res6: String = banana\n\n``` \n```\nSELECT :fruits_combobox\n\n-- banana\n\n``` \nThis example gets the value of the notebook task parameter that has the programmatic name `age`. This parameter was set to `35` when the related notebook task was run. \n```\ndbutils.widgets.get('age')\n\n# 35\n\n``` \n```\ndbutils.widgets.get('age')\n\n# [1] \"35\"\n\n``` \n```\ndbutils.widgets.get(\"age\")\n\n// res6: String = 35\n\n``` \n```\nSELECT :age\n\n-- 35\n\n``` \n### getAll command (dbutils.widgets.getAll) \nGets a mapping of all current widget names and values. This can be especially useful to quickly pass widget values to a `spark.sql()` query. \nThis command is available in Databricks Runtime 13.3 LTS and above. It is only available for Python and Scala. \nTo display help for this command, run `dbutils.widgets.help(\"getAll\")`. \nThis example gets the map of widget values and passes it as parameter arguments in a Spark SQL query. \n```\ndf = spark.sql(\"SELECT * FROM table where col1 = :param\", dbutils.widgets.getAll())\ndf.show()\n\n# Query output\n\n``` \n```\nval df = spark.sql(\"SELECT * FROM table where col1 = :param\", dbutils.widgets.getAll())\ndf.show()\n\n// res6: Query output\n\n``` \n### getArgument command (dbutils.widgets.getArgument) \nGets the current value of the widget with the specified programmatic name. If the widget does not exist, an optional message can be returned. \nNote \nThis command is deprecated. Use [dbutils.widgets.get](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets-get) instead. \nTo display help for this command, run `dbutils.widgets.help(\"getArgument\")`. \nThis example gets the value of the widget that has the programmatic name `fruits_combobox`. If this widget does not exist, the message `Error: Cannot find fruits combobox` is returned. \n```\ndbutils.widgets.getArgument('fruits_combobox', 'Error: Cannot find fruits combobox')\n\n# Deprecation warning: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.\n# Out[3]: 'banana'\n\n``` \n```\ndbutils.widgets.getArgument('fruits_combobox', 'Error: Cannot find fruits combobox')\n\n# Deprecation warning: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.\n# [1] \"banana\"\n\n``` \n```\ndbutils.widgets.getArgument(\"fruits_combobox\", \"Error: Cannot find fruits combobox\")\n\n// command-1234567890123456:1: warning: method getArgument in trait WidgetsUtils is deprecated: Use dbutils.widgets.text() or dbutils.widgets.dropdown() to create a widget and dbutils.widgets.get() to get its bound value.\n// dbutils.widgets.getArgument(\"fruits_combobox\", \"Error: Cannot find fruits combobox\")\n// ^\n// res7: String = banana\n\n``` \n### multiselect command (dbutils.widgets.multiselect) \nCreates and displays a multiselect widget with the specified programmatic name, default value, choices, and optional label. \nTo display help for this command, run `dbutils.widgets.help(\"multiselect\")`. \nThis example creates and displays a multiselect widget with the programmatic name `days_multiselect`. It offers the choices `Monday` through `Sunday` and is set to the initial value of `Tuesday`. This multiselect widget has an accompanying label `Days of the Week`. This example ends by printing the initial value of the multiselect widget, `Tuesday`. \n```\ndbutils.widgets.multiselect(\nname='days_multiselect',\ndefaultValue='Tuesday',\nchoices=['Monday', 'Tuesday', 'Wednesday', 'Thursday',\n'Friday', 'Saturday', 'Sunday'],\nlabel='Days of the Week'\n)\n\nprint(dbutils.widgets.get(\"days_multiselect\"))\n\n# Tuesday\n\n``` \n```\ndbutils.widgets.multiselect(\nname='days_multiselect',\ndefaultValue='Tuesday',\nchoices=list('Monday', 'Tuesday', 'Wednesday', 'Thursday',\n'Friday', 'Saturday', 'Sunday'),\nlabel='Days of the Week'\n)\n\nprint(dbutils.widgets.get(\"days_multiselect\"))\n\n# [1] \"Tuesday\"\n\n``` \n```\ndbutils.widgets.multiselect(\n\"days_multiselect\",\n\"Tuesday\",\nArray(\"Monday\", \"Tuesday\", \"Wednesday\", \"Thursday\",\n\"Friday\", \"Saturday\", \"Sunday\"),\n\"Days of the Week\"\n)\n\nprint(dbutils.widgets.get(\"days_multiselect\"))\n\n// Tuesday\n\n``` \n```\nCREATE WIDGET MULTISELECT days_multiselect DEFAULT \"Tuesday\" CHOICES SELECT * FROM (VALUES (\"Monday\"), (\"Tuesday\"), (\"Wednesday\"), (\"Thursday\"), (\"Friday\"), (\"Saturday\"), (\"Sunday\"))\n\nSELECT :days_multiselect\n\n-- Tuesday\n\n``` \n### remove command (dbutils.widgets.remove) \nRemoves the widget with the specified programmatic name. \nTo display help for this command, run `dbutils.widgets.help(\"remove\")`. \nImportant \nIf you add a command to remove a widget, you cannot add a subsequent command to create a widget in the same cell. You must create the widget in another cell. \nThis example removes the widget with the programmatic name `fruits_combobox`. \n```\ndbutils.widgets.remove('fruits_combobox')\n\n``` \n```\ndbutils.widgets.remove('fruits_combobox')\n\n``` \n```\ndbutils.widgets.remove(\"fruits_combobox\")\n\n``` \n```\nREMOVE WIDGET fruits_combobox\n\n``` \n### removeAll command (dbutils.widgets.removeAll) \nRemoves all widgets from the notebook. \nTo display help for this command, run `dbutils.widgets.help(\"removeAll\")`. \nImportant \nIf you add a command to remove all widgets, you cannot add a subsequent command to create any widgets in the same cell. You must create the widgets in another cell. \nThis example removes all widgets from the notebook. \n```\ndbutils.widgets.removeAll()\n\n``` \n```\ndbutils.widgets.removeAll()\n\n``` \n```\ndbutils.widgets.removeAll()\n\n``` \n### text command (dbutils.widgets.text) \nCreates and displays a text widget with the specified programmatic name, default value, and optional label. \nTo display help for this command, run `dbutils.widgets.help(\"text\")`. \nThis example creates and displays a text widget with the programmatic name `your_name_text`. It is set to the initial value of `Enter your name`. This text widget has an accompanying label `Your name`. This example ends by printing the initial value of the text widget, `Enter your name`. \n```\ndbutils.widgets.text(\nname='your_name_text',\ndefaultValue='Enter your name',\nlabel='Your name'\n)\n\nprint(dbutils.widgets.get(\"your_name_text\"))\n\n# Enter your name\n\n``` \n```\ndbutils.widgets.text(\nname='your_name_text',\ndefaultValue='Enter your name',\nlabel='Your name'\n)\n\nprint(dbutils.widgets.get(\"your_name_text\"))\n\n# [1] \"Enter your name\"\n\n``` \n```\ndbutils.widgets.text(\n\"your_name_text\",\n\"Enter your name\",\n\"Your name\"\n)\n\nprint(dbutils.widgets.get(\"your_name_text\"))\n\n// Enter your name\n\n``` \n```\nCREATE WIDGET TEXT your_name_text DEFAULT \"Enter your name\"\n\nSELECT :your_name_text\n\n-- Enter your name\n\n```\n\n", "chunk_id": "bb622b8290fa7844fc080e49905af166", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Databricks Utilities API library\n\nImportant \nThe Databricks Utilities API (`dbutils-api`) library is deprecated. Although this library is still available, Databricks plans no new feature work for the `dbutils-api` library. \nDatabricks recommends that you use one of the following libraries instead: \n* [Databricks Utilities for Scala, with Java](https://docs.databricks.com/dev-tools/sdk-java.html#dbutils-java)\n* [Databricks Utilities for Scala, with Scala](https://docs.databricks.com/dev-tools/sdk-java.html#dbutils-scala) \nTo accelerate application development, it can be helpful to compile, build, and test applications before you deploy them as production jobs. To enable you to compile against Databricks Utilities, Databricks provides the `dbutils-api` library. You can download the `dbutils-api` library from the [DBUtils API](https://mvnrepository.com/artifact/com.databricks/dbutils-api) webpage on the Maven Repository website or include the library by adding a dependency to your build file: \n* SBT \n```\nlibraryDependencies += \"com.databricks\" % \"dbutils-api_TARGET\" % \"VERSION\"\n\n```\n* Maven \n```\n\ncom.databricks\ndbutils-api_TARGET\nVERSION\n\n\n```\n* Gradle \n```\ncompile 'com.databricks:dbutils-api_TARGET:VERSION'\n\n``` \nReplace `TARGET` with the desired target (for example `2.12`) and `VERSION` with the desired version (for example `0.0.5`). For a list of available targets and versions, see the [DBUtils API](https://mvnrepository.com/artifact/com.databricks/dbutils-api) webpage on the Maven Repository website. \nOnce you build your application against this library, you can deploy the application. \nImportant \nThe `dbutils-api` library allows you to locally compile an application that uses `dbutils`, but not to run it. To run the application, you must deploy it in Databricks.\n\n", "chunk_id": "ea322e913779dbae66383465ebd56aca", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Limitations\n\nCalling `dbutils` inside of executors can produce unexpected results or potentially result in errors. \nIf you need to run file system operations on executors using `dbutils`, there are several faster and more scalable alternatives available: \n* For file copy or move operations, you can check a faster option of running filesystem operations described in [Parallelize filesystem operations](https://kb.databricks.com/dbfs/parallelize-fs-operations.html).\n* For file system list and delete operations, you can refer to parallel listing and delete methods utilizing Spark in [How to list and delete files faster in Databricks](https://kb.databricks.com/data/list-delete-files-faster.html). \nFor information about executors, see [Cluster Mode Overview](https://spark.apache.org/docs/latest/cluster-overview.html) on the Apache Spark website.\n\n", "chunk_id": "6e724c31e835c9e63f6eabeef2cbab1d", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### DROP SCHEMA\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nDrops a schema and deletes the directory associated with the schema from the file system. An\nexception is thrown if the schema does not exist in the system. To drop a schema you must be its owner. \nWhile usage of `SCHEMA` and `DATABASE` is interchangeable, `SCHEMA` is preferred.\n\n#### DROP SCHEMA\n##### Syntax\n\n```\nDROP SCHEMA [ IF EXISTS ] schema_name [ RESTRICT | CASCADE ]\n\n```\n\n#### DROP SCHEMA\n##### Parameters\n\n* **IF EXISTS** \nIf specified, no exception is thrown when the schema does not exist.\n* **[schema\\_name](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#schema-name)** \nThe name of an existing schemas in the system.\nIf the name does not exist, an exception is thrown.\n* **RESTRICT** \nIf specified, restricts dropping a non-empty schema and is enabled by default.\n* **CASCADE** \nIf specified, drops all the associated tables and functions recursively. In Unity Catalog, dropping a schema using `CASCADE` soft-deletes tables: managed table files will be cleaned up after 30 days, but external files are not deleted. **Warning!** If the schema is managed by the workspace-level Hive metastore, dropping a schema using `CASCADE` recursively deletes all files in the specified location, regardless of the table type (managed or external).\n\n", "chunk_id": "746f561080952f79fe2b9b3ec92af1a6", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-drop-schema.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### DROP SCHEMA\n##### Examples\n\n```\n-- Create `inventory_schema` Database\n> CREATE SCHEMA inventory_schema COMMENT 'This schema is used to maintain Inventory';\n\n-- Drop the schema and its tables\n> DROP SCHEMA inventory_schema CASCADE;\n\n-- Drop the schema using IF EXISTS\n> DROP SCHEMA IF EXISTS inventory_schema CASCADE;\n\n```\n\n#### DROP SCHEMA\n##### Related articles\n\n* [CREATE SCHEMA](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-schema.html)\n* [DESCRIBE SCHEMA](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-describe-schema.html)\n* [SHOW SCHEMAS](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-show-schemas.html)\n\n", "chunk_id": "f4254891fef704a6041a2608d4e15508", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-drop-schema.html"} +{"chunked_text": "# Databricks data engineering\n## Libraries\n#### Install libraries from workspace files\n\nThis article walks you through the steps required to upload package or requirements.txt files to workspace files and install them onto clusters in Databricks. You can install libraries onto all-purpose compute or job compute. \nImportant \nThis article describes storing libraries as workspace files. This is different than [workspace libraries](https://docs.databricks.com/archive/legacy/workspace-libraries.html) which are deprecated. \nFor more information about workspace files, see [Navigate the workspace](https://docs.databricks.com/workspace/index.html). \nFor full library compatibility details, see [Libraries](https://docs.databricks.com/libraries/index.html).\n\n#### Install libraries from workspace files\n##### Load libraries to workspace files\n\nYou can load libraries to workspace files the same way you load other files. \nTo load a library to workspace files: \n1. Click ![Workspace Icon](https://docs.databricks.com/_images/workspace-icon.png) **Workspace** in the left sidebar.\n2. Navigate to the location in the workspace where you want to upload the library.\n3. Click the ![Kebab menu](https://docs.databricks.com/_images/kebab-menu.png) in the upper right and choose **Import**.\n4. The **Import** dialog appears. For **Import from:** choose **File** or **URL**. Drag and drop or browse to the file(s) you want to upload, or provide the URL path to the file.\n5. Click **Import**.\n\n", "chunk_id": "7e3d15e9b36dc0467dbbedd87b0d78d1", "url": "https://docs.databricks.com/libraries/workspace-files-libraries.html"} +{"chunked_text": "# Databricks data engineering\n## Libraries\n#### Install libraries from workspace files\n##### Install libraries from workspace files onto a cluster\n\nWhen you install a library onto a cluster, all notebooks running on that cluster have access to the library. \nTo install a library from workspace files onto a cluster: \n1. Click ![compute icon](https://docs.databricks.com/_images/clusters-icon.png) **Compute** in the left sidebar.\n2. Click the name of the cluster in the cluster list.\n3. Click the **Libraries** tab.\n4. Click **Install new**. The **Install library** dialog appears.\n5. For **Library Source**, select **Workspace**.\n6. Upload the library or requirements.txt file, browse to the library or requirements.txt in the workspace, or enter its workspace location in the **Workspace File Path** field, such as the following:\n`/Workspace/Users/someone@example.com//.`\n7. Click **Install**.\n\n", "chunk_id": "b97a68e754c4b52e13583aefed6be56d", "url": "https://docs.databricks.com/libraries/workspace-files-libraries.html"} +{"chunked_text": "# Databricks data engineering\n## Libraries\n#### Install libraries from workspace files\n##### Add dependent libraries to workflow tasks from workspace files\n\nYou can add dependent libraries to tasks from workspace files. See [Configure dependent libraries](https://docs.databricks.com/workflows/jobs/settings.html#task-config-dependent-libraries). \nTo configure a workflow task with a dependent library from workspace files: \n1. Select an existing task in a workflow or create a new task.\n2. Next to **Dependent libraries**, click **+ Add**.\n3. In the **Add dependent library** dialog, select **Workspace** for **Library Source**.\n4. Upload the library or requirements.txt file, browse to the library or requirements.txt file in the workspace, or enter its workspace location in the **Workspace File Path** field, such as the following:\n`/Workspace/Users/someone@example.com//.`\n5. Click **Install**.\n\n#### Install libraries from workspace files\n##### Install libraries from workspace files to a notebook\n\nYou can install Python libraries directly to a notebook to create custom environments that are specific to the notebook. For example, you can use a specific version of a library in a notebook, without affecting other users on the cluster who may need a different version of the same library. For more information, see [notebook-scoped libraries](https://docs.databricks.com/libraries/notebooks-python-libraries.html). \nWhen you install a library to a notebook, only the current notebook and any jobs associated with that notebook have access to that library. Other notebooks attached to the same cluster are not affected.\n\n", "chunk_id": "c3d93fd5edcbc066a55b9c9f81bc0c5a", "url": "https://docs.databricks.com/libraries/workspace-files-libraries.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is CI/CD on Databricks?\n#### What are Databricks Asset Bundles?\n###### Define artifact settings dynamically in Databricks Asset Bundles\n\nThis article describes how to override the settings for artifacts in *Databricks Asset Bundles*. See [What are Databricks Asset Bundles?](https://docs.databricks.com/dev-tools/bundles/index.html) \nIn Databricks [bundle configuration files](https://docs.databricks.com/dev-tools/bundles/settings.html), you can join the artifact settings in a top-level `artifacts` mapping with the artifact settings in an `targets` mapping, for example (ellipses indicate omitted content, for brevity): \n```\n# ...\nartifacts:\n:\n# Artifact settings.\n\ntargets:\n:\nresources:\nartifacts:\n:\n# Any more artifact settings to join with the settings from the\n# matching top-level artifacts mapping.\n\n``` \nIf any artifact setting is defined both in the top-level `artifacts` mapping and the `targets` mapping for the same artifact, then the setting in the `targets` mapping takes precedence over the setting in the top-level `artifacts` mapping.\n\n", "chunk_id": "1cd2bc7fe6dd80ad090786ae885e3d23", "url": "https://docs.databricks.com/dev-tools/bundles/artifact-overrides.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is CI/CD on Databricks?\n#### What are Databricks Asset Bundles?\n###### Define artifact settings dynamically in Databricks Asset Bundles\n####### Example 1: Artifact settings defined only in the top-level artifacts mapping\n\nTo demonstrate how this works in practice, in the following example, `path` is defined in the top-level `artifacts` mapping, which defines all of the settings for the artifact (ellipses indicate omitted content, for brevity): \n```\n# ...\nartifacts:\nmy-artifact:\ntype: whl\npath: ./my_package\n# ...\n\n``` \nWhen you run `databricks bundle validate` for this example, the resulting graph is (ellipses indicate omitted content, for brevity): \n```\n{\n\"...\": \"...\",\n\"artifacts\": {\n\"my-artifact\": {\n\"type\": \"whl\",\n\"path\": \"./my_package\",\n\"...\": \"...\"\n}\n},\n\"...\": \"...\"\n}\n\n```\n\n", "chunk_id": "4f7dd889a3719393fc21adf0d0c3bb70", "url": "https://docs.databricks.com/dev-tools/bundles/artifact-overrides.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is CI/CD on Databricks?\n#### What are Databricks Asset Bundles?\n###### Define artifact settings dynamically in Databricks Asset Bundles\n####### Example 2: Conflicting artifact settings defined in multiple artifact mappings\n\nIn this example, `path` is defined both in the top-level `artifacts` mapping and in the `artifacts` mapping in `targets`. In this example, `path` in the `artifacts` mapping in `targets` takes precedence over `path` in the top-level `artifacts` mapping, to define the settings for the artifact (ellipses indicate omitted content, for brevity): \n```\n# ...\nartifacts:\nmy-artifact:\ntype: whl\npath: ./my_package\n\ntargets:\ndev:\nartifacts:\nmy-artifact:\npath: ./my_other_package\n# ...\n\n``` \nWhen you run `databricks bundle validate` for this example, the resulting graph is (ellipses indicate omitted content, for brevity): \n```\n{\n\"...\": \"...\",\n\"artifacts\": {\n\"my-artifact\": {\n\"type\": \"whl\",\n\"path\": \"./my_other_package\",\n\"...\": \"...\"\n}\n},\n\"...\": \"...\"\n}\n\n```\n\n", "chunk_id": "5bd3c08b529bd0ce4ec2290bf785c154", "url": "https://docs.databricks.com/dev-tools/bundles/artifact-overrides.html"} +{"chunked_text": "# Ingest data into a Databricks lakehouse\n## Get started using COPY INTO to load data\n#### Load data using COPY INTO with an instance profile\n\nThis article describes how to use the `COPY INTO` command to load data from an Amazon S3 bucket in your AWS account into a table in Databricks SQL. \nThe steps in this article assume that your admin has configured a SQL warehouse to use an AWS instance profile so that you can access your source files in S3. If your admin configured a Unity Catalog external location with a storage credential, see [Load data using COPY INTO with Unity Catalog volumes or external locations](https://docs.databricks.com/ingestion/copy-into/unity-catalog.html) instead. If your admin gave you temporary credentials (an AWS access key ID, a secret key, and a session token), see [Load data using COPY INTO with temporary credentials](https://docs.databricks.com/ingestion/copy-into/temporary-credentials.html) instead. \nDatabricks recommends using the [COPY INTO](https://docs.databricks.com/ingestion/copy-into/index.html) command for incremental and bulk data loading with Databricks SQL. \nNote \n`COPY INTO` works well for data sources that contain thousands of files. Databricks recommends that you use [Auto Loader](https://docs.databricks.com/ingestion/auto-loader/index.html) for loading millions of files, which is not supported in Databricks SQL.\n\n#### Load data using COPY INTO with an instance profile\n##### Before you begin\n\nBefore you load data into Databricks, make sure you have the following: \n* Access to data in S3. Your admin must first complete the steps in [Configure data access for ingestion](https://docs.databricks.com/ingestion/copy-into/configure-data-access.html) so your Databricks SQL warehouse can read your source files.\n* A Databricks SQL warehouse that uses the instance profile that your admin created.\n* The **Can manage** permission on the SQL warehouse.\n* The fully qualified S3 URI.\n* Familiarity with the Databricks SQL user interface.\n\n", "chunk_id": "11aff8dcae26fce63d44e7ea6ad300a9", "url": "https://docs.databricks.com/ingestion/copy-into/tutorial-dbsql.html"} +{"chunked_text": "# Ingest data into a Databricks lakehouse\n## Get started using COPY INTO to load data\n#### Load data using COPY INTO with an instance profile\n##### Step 1: Confirm access to data in cloud storage\n\nTo confirm that you have access to the correct data in cloud object storage, do the following: \n1. In the sidebar, click **Create > Query**.\n2. In the SQL editor\u2019s menu bar, select a SQL warehouse.\n3. In the SQL editor, paste the following code: \n```\nselect * from csv.\n\n``` \nReplace `` with the S3 URI that you received from your admin. For example, `s3:////`.\n4. Click **Run**.\n\n#### Load data using COPY INTO with an instance profile\n##### Step 2: Create a table\n\nThis step describes how to create a table in your Databricks workspace to hold the incoming data. \n1. In the SQL editor, paste the following code: \n```\nCREATE TABLE .. (\ntpep_pickup_datetime TIMESTAMP,\ntpep_dropoff_datetime TIMESTAMP,\ntrip_distance DOUBLE,\nfare_amount DOUBLE,\npickup_zip INT,\ndropoff_zip INT\n);\n\n```\n2. Click **Run**.\n\n", "chunk_id": "2b7a916db1185300dc7e768818bf5123", "url": "https://docs.databricks.com/ingestion/copy-into/tutorial-dbsql.html"} +{"chunked_text": "# Ingest data into a Databricks lakehouse\n## Get started using COPY INTO to load data\n#### Load data using COPY INTO with an instance profile\n##### Step 3: Load data from cloud storage into the table\n\nThis step describes how to load data from an S3 bucket into the table you created in your Databricks workspace. \n1. In the sidebar, click **Create > Query**.\n2. In the SQL editor\u2019s menu bar, select a SQL warehouse and make sure the SQL warehouse is running.\n3. In the SQL editor, paste the following code. In this code, replace: \n* `` with the name of your S3 bucket.\n* `` with the name of the folder in your S3 bucket.\n```\nCOPY INTO ..\nFROM 's3:////'\nFILEFORMAT = CSV\nFORMAT_OPTIONS (\n'header' = 'true',\n'inferSchema' = 'true'\n)\nCOPY_OPTIONS (\n'mergeSchema' = 'true'\n);\n\nSELECT * FROM ..;\n\n``` \nNote \n`FORMAT_OPTIONS` differs depending on `FILEFORMAT`. In this case, the `header` option instructs Databricks to treat the first row of the CSV file as a header, and the `inferSchema` options instructs Databricks to automatically determine the data type of each field in the CSV file.\n4. Click **Run**. \nNote \nIf you click **Run** again, no new data is loaded into the table. This is because the `COPY INTO` command only processes what it considers to be new data.\n\n", "chunk_id": "5e9a5cf2c2dfb98e81c14ff749732c42", "url": "https://docs.databricks.com/ingestion/copy-into/tutorial-dbsql.html"} +{"chunked_text": "# Ingest data into a Databricks lakehouse\n## Get started using COPY INTO to load data\n#### Load data using COPY INTO with an instance profile\n##### Clean up\n\nYou can clean up the associated resources in your workspace if you no longer want to keep them. \n### Delete the tables \n1. In the sidebar, click **Create > Query**.\n2. Select a SQL warehouse and make sure that the SQL warehouse is running.\n3. Paste the following code: \n```\nDROP TABLE ..;\n\n```\n4. Click **Run**.\n5. Hover over the tab for this query, and then click the **X** icon. \n### Delete the queries in the SQL editor \n1. In the sidebar, click **SQL Editor**.\n2. In the SQL editor\u2019s menu bar, hover over the tab for each query that you created for this tutorial, and then click the **X** icon.\n\n#### Load data using COPY INTO with an instance profile\n##### Additional resources\n\n* The [COPY INTO](https://docs.databricks.com/sql/language-manual/delta-copy-into.html) reference article\n\n", "chunk_id": "5ff84ad2f9cf42da1d7747f641a41a24", "url": "https://docs.databricks.com/ingestion/copy-into/tutorial-dbsql.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks ODBC Driver\n######### Driver capability settings for the Databricks ODBC Driver\n\nThis article describes how to configure special and advanced driver capability settings for the [Databricks ODBC Driver](https://docs.databricks.com/integrations/odbc/index.html). \nThe Datbricks ODBC Driver provides the following special and advanced driver capability settings. \n* [Set the initial schema in ODBC](https://docs.databricks.com/integrations/odbc/capability.html#odbc-native)\n* [ANSI SQL-92 query support in ODBC](https://docs.databricks.com/integrations/odbc/capability.html#odbc-ansi)\n* [Extract large query results in ODBC](https://docs.databricks.com/integrations/odbc/capability.html#odbc-extract)\n* [Arrow serialization in ODBC](https://docs.databricks.com/integrations/odbc/capability.html#odbc-arrow)\n* [Cloud Fetch in ODBC](https://docs.databricks.com/integrations/odbc/capability.html#cloud-fetch-in-odbc)\n* [Advanced configurations](https://docs.databricks.com/integrations/odbc/capability.html#advanced-configurations)\n* [Enable logging](https://docs.databricks.com/integrations/odbc/capability.html#enable-logging)\n\n######### Driver capability settings for the Databricks ODBC Driver\n########## Set the initial schema in ODBC\n\nThe ODBC driver allows you to specify the schema by setting `Schema=` as a connection configuration. This is equivalent to running `USE `.\n\n", "chunk_id": "8ae929e6f1aa57a29b9d3f9289e18808", "url": "https://docs.databricks.com/integrations/odbc/capability.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks ODBC Driver\n######### Driver capability settings for the Databricks ODBC Driver\n########## ANSI SQL-92 query support in ODBC\n\nThe ODBC driver accepts SQL queries in ANSI SQL-92 dialect and translates the queries to the Databricks SQL dialect. However, if your application generates Databricks SQL directly or your application uses any non-ANSI SQL-92 standard SQL syntax specific to Databricks, Databricks recommends that you set `UseNativeQuery=1` as a connection configuration. With that setting, the driver passes the SQL queries verbatim to Databricks.\n\n######### Driver capability settings for the Databricks ODBC Driver\n########## Extract large query results in ODBC\n\nTo achieve the best performance when you extract large query results, use the latest version of the ODBC driver that includes the following optimizations.\n\n######### Driver capability settings for the Databricks ODBC Driver\n########## Arrow serialization in ODBC\n\nODBC driver version 2.6.15 and above supports an optimized query results serialization format that uses [Apache Arrow](https://arrow.apache.org/docs/index.html).\n\n", "chunk_id": "839b05a11852a9b91ac861ae4d7971b2", "url": "https://docs.databricks.com/integrations/odbc/capability.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks ODBC Driver\n######### Driver capability settings for the Databricks ODBC Driver\n########## Cloud Fetch in ODBC\n\nODBC driver version 2.6.17 and above supports Cloud Fetch, a capability that fetches query results through the cloud storage that is set up in your Databricks deployment. \nQuery results are uploaded to an internal [DBFS storage location](https://docs.databricks.com/dbfs/index.html) as Arrow-serialized files of up to 20 MB. When the driver sends fetch requests after query completion, Databricks generates and returns [presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-presigned-url.html) to the uploaded files. The ODBC driver then uses the URLs to download the results directly from DBFS. \nCloud Fetch is only used for query results larger than 1 MB. Smaller results are retrieved directly from Databricks. \nDatabricks automatically garbage collects the accumulated files, which are marked for deletion after 24 hours. These marked files are completely deleted after an additional 24 hours. \nCloud Fetch is only available for E2 workspaces. Also, your corresponding Amazon S3 buckets must not have versioning enabled. If you have versioning enabled, you can still enable Cloud Fetch by following the instructions in [Advanced configurations](https://docs.databricks.com/integrations/odbc/capability.html#advanced-configurations). \nTo learn more about the Cloud Fetch architecture, see [How We Achieved High-bandwidth Connectivity With BI Tools](https://databricks.com/blog/2021/08/11/how-we-achieved-high-bandwidth-connectivity-with-bi-tools.html).\n\n", "chunk_id": "8bc24dabe5bcad713e551a8ba1601a16", "url": "https://docs.databricks.com/integrations/odbc/capability.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks ODBC Driver\n######### Driver capability settings for the Databricks ODBC Driver\n########## Advanced configurations\n\nIf you have enabled [S3 bucket versioning](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html) on your [DBFS root](https://docs.databricks.com/dbfs/index.html), then Databricks cannot garbage collect older versions of uploaded query results. We recommend setting an S3 lifecycle policy first that purges older versions of uploaded query results. \nTo set a lifecycle policy follow the steps below: \n1. In the AWS console, go to the **S3** service.\n2. Click on the [S3 bucket](https://docs.databricks.com/admin/account-settings-e2/storage.html) that you use for your workspace\u2019s root storage.\n3. Open the **Management** tab and choose **Create lifecycle rule**.\n4. Choose any name for the **Lifecycle rule name**.\n5. Keep the prefix field empty.\n6. Under **Lifecycle rule actions** select **Permanently delete noncurrent versions of objects**.\n7. Set a value under **Days after objects become noncurrent**. We recommend using the value 1 here.\n8. Click **Create rule**. \n![Lifecycle policy](https://docs.databricks.com/_images/lifecycle-policy-with-tags.png)\n\n", "chunk_id": "7e8fafb46adc1269d83db01691a9dc7d", "url": "https://docs.databricks.com/integrations/odbc/capability.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks ODBC Driver\n######### Driver capability settings for the Databricks ODBC Driver\n########## Enable logging\n\nTo enable logging in the ODBC driver for Windows, set the following fields in the ODBC Data Source Administrator for the related DSN: \n* Set the **Log Level** field from **FATAL** to log only severe events through **TRACE** to log all driver activity.\n* Set the **Log Path** field to the full path to the folder where you want to save log files.\n* Set the **Max Number Files** field to the maximum number of log files to keep.\n* Set the **Max File Size** field to the maximum size of each log file in megabytes. \nTo enable logging in the ODBC driver for a non-Windows machine, set the following properties in the related DSN or DSN-less connection string: \n* Set the `LogLevel` property from `1` to log only severe events through `6` to log all driver activity.\n* Set the `LogPath` property to the full path to the folder where you want to save log files.\n* Set the `LogFileCount` property to the maximum number of log files to keep.\n* Set the `LogFileSize` property to the maximum size of each log file in bytes. \nFor more information, see the sections `Configuring Logging Options on Windows` and `Configuring Logging Options on a Non-Windows Machine` in the [Databricks JDBC Driver Guide](https://docs.databricks.com/_extras/documents/Databricks-JDBC-Driver-Install-and-Configuration-Guide.pdf).\n\n", "chunk_id": "2d7e2c5e78e69d41b1737c4c2e955016", "url": "https://docs.databricks.com/integrations/odbc/capability.html"} +{"chunked_text": "# What is Delta Lake?\n### Data skipping for Delta Lake\n\nData skipping information is collected automatically when you write data into a Delta table. Delta Lake on Databricks takes advantage of this information (minimum and maximum values, null counts, and total records per file) at query time to provide faster queries. \nNote \nIn Databricks Runtime 13.3 and above, Databricks recommends using clustering for Delta table layout. Clustering is not compatible with Z-ordering. See [Use liquid clustering for Delta tables](https://docs.databricks.com/delta/clustering.html). \nYou must have statistics collected for columns that are used in `ZORDER` statements. See [What is Z-ordering?](https://docs.databricks.com/delta/data-skipping.html#delta-zorder).\n\n", "chunk_id": "7763a2a95f996984ac1cbb3000921ba6", "url": "https://docs.databricks.com/delta/data-skipping.html"} +{"chunked_text": "# What is Delta Lake?\n### Data skipping for Delta Lake\n#### Specify Delta statistics columns\n\nBy default, Delta Lake collects statistics on the first 32 columns defined in your table schema. For this collection, each field in a nested column is considered an individual column. You can modify this behavior by setting one of the following table properties: \n| Table property | Databricks Runtime supported | Description |\n| --- | --- | --- |\n| `delta.dataSkippingNumIndexedCols` | All supported Databricks Runtime versions | Increase or decrease the number of columns on which Delta collects statistics. Depends on column order. |\n| `delta.dataSkippingStatsColumns` | Databricks Runtime 13.3 LTS and above | Specify a list of column names for which Delta Lake collects statistics. Supersedes `dataSkippingNumIndexedCols`. | \nTable properties can be set at table creation or with `ALTER TABLE` statements. See [Delta table properties reference](https://docs.databricks.com/delta/table-properties.html). \nUpdating this property does not automatically recompute statistics for existing data. Rather, it impacts the behavior of future statistics collection when adding or updating data in the table. Delta Lake does not leverage statistics for columns not included in the current list of statistics columns. \nIn Databricks Runtime 14.3 LTS and above, you can manually trigger the recomputation of statistics for a Delta table using the following command: \n```\nANALYZE TABLE table_name COMPUTE DELTA STATISTICS\n\n``` \nNote \nLong strings are truncated during statistics collection. You might choose to exclude long string columns from statistics collection, especially if the columns aren\u2019t used frequently for filtering queries.\n\n", "chunk_id": "f7a0ec63f58a248c29c052a31160f4a7", "url": "https://docs.databricks.com/delta/data-skipping.html"} +{"chunked_text": "# What is Delta Lake?\n### Data skipping for Delta Lake\n#### What is Z-ordering?\n\nZ-ordering is a [technique](https://en.wikipedia.org/wiki/Z-order_curve) to colocate related information in the same set of files. This co-locality is automatically used by Delta Lake on Databricks data-skipping algorithms. This behavior dramatically reduces the amount of data that Delta Lake on Databricks needs to read. To Z-order data, you specify the columns to order on in the `ZORDER BY` clause: \n```\nOPTIMIZE events\nWHERE date >= current_timestamp() - INTERVAL 1 day\nZORDER BY (eventType)\n\n``` \nIf you expect a column to be commonly used in query predicates and if that column has high cardinality (that is, a large number of distinct values), then use `ZORDER BY`. \nYou can specify multiple columns for `ZORDER BY` as a comma-separated list. However, the effectiveness of the locality drops with each extra column. Z-ordering on columns that do not have statistics collected on them would be ineffective and a waste of resources. This is because data skipping requires column-local stats such as min, max, and count. You can configure statistics collection on certain columns by reordering columns in the schema, or you can increase the number of columns to collect statistics on. \nNote \n* Z-ordering is *not idempotent* but aims to be an incremental operation. The time it takes for Z-ordering is not guaranteed to reduce over multiple runs. However, if no new data was added to a partition that was just Z-ordered, another Z-ordering of that partition will not have any effect.\n* Z-ordering aims to produce evenly-balanced data files with respect to the number of tuples, but not necessarily data size on disk. The two measures are most often correlated, but there can be situations when that is not the case, leading to skew in optimize task times. \nFor example, if you `ZORDER BY` *date* and your most recent records are all much wider (for example longer arrays or string values) than the ones in the past, it is expected that the `OPTIMIZE` job\u2019s task durations will be skewed, as well as the resulting file sizes. This is, however, only a problem for the `OPTIMIZE` command itself; it should not have any negative impact on subsequent queries.\n\n", "chunk_id": "44f970f09908b95261ae0dac7ee387c0", "url": "https://docs.databricks.com/delta/data-skipping.html"} +{"chunked_text": "# Get started: Account and workspace setup\n## Navigate the workspace\n#### Search for workspace objects\n\nThis article describes how to search for tables, notebooks, queries, dashboards, alerts, files, folders, libraries, jobs, repos, partners, and Marketplace listings in your Databricks workspace. \nTables must be registered in Unity Catalog to appear in search results. \nNote \nThe search behavior described in this section is not supported for non-E2 workspaces. In those workspaces, you can click ![Search Icon](https://docs.databricks.com/_images/search-icon.png) **Search** in the sidebar and type a search string in the **Search Workspace** field. As you type, objects whose name contains the search string are listed. Click a name from the list to open that item in the workspace. \nIn workspaces that use [customer-managed keys for encryption](https://docs.databricks.com/security/keys/customer-managed-keys.html), notebook contents and query contents are not available in search.\n\n#### Search for workspace objects\n##### Intelligent search\n\nDatabricks search leverages [DatabricksIQ](https://docs.databricks.com/databricksiq/index.html), the Data Intelligence Engine for Databricks, to provide a more intelligent AI-powered search experience. AI-generated comments use LLMs to automatically add descriptions and tags to tables and columns managed by Unity Catalog. These comments make the search engine aware of unique company jargon, metrics, and semantics, giving it the context needed to make search results more relevant, accurate, and actionable.\n\n", "chunk_id": "83141b93540645bb97fe39867c45aa04", "url": "https://docs.databricks.com/search/index.html"} +{"chunked_text": "# Get started: Account and workspace setup\n## Navigate the workspace\n#### Search for workspace objects\n##### Navigational search\n\nTo search the workspace using navigational search in the top bar of the UI, do the following: \n1. Click the **Search** field in the top bar of the Databricks workspace or use the keyboard shortcut Command-P. \n![Navigational search bar](https://docs.databricks.com/_images/navigational-search.png) \nYour recent files, notebooks, queries, alerts, and dashboards are listed under **Recents**, sorted by the last opened date.\n2. Enter your search criteria. \nRecent objects in the list are filtered to match your search criteria. Navigational search might also suggest other objects that match your criteria. To perform a complete search of the workspace, use the **Search results** page.\n3. Select an item from the list.\n\n#### Search for workspace objects\n##### Search results page\n\nThe full-page search experience gives you more space to see results, more metadata for your objects, and more filters to narrow down your results. \nTo filter search results by object type, object owner, or last modified date on the **Search results** page, do the following: \n1. Click the **Search** field in the top bar of the Databricks workspace or use the keyboard shortcut Command-P, and then press Enter. \nThe **Search results** page opens.\n2. Enter your search criteria.\n3. Select an item from the list. \nYou can search by text string, by object type, or both. After you type your search criteria and press **Enter**, the system searches the names of all queries, dashboards, alerts, files, folders, notebooks, libraries, repos, partners, and Marketplace listings in the workspace that you have access to. If your workspace is [enabled for Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/enable-workspaces.html), the system also searches table names, table comments, column names, and column comments.\n\n", "chunk_id": "4a2f672f634b59a9a8adfeff930d2ec6", "url": "https://docs.databricks.com/search/index.html"} +{"chunked_text": "# Get started: Account and workspace setup\n## Navigate the workspace\n#### Search for workspace objects\n##### Search by text string\n\nTo search for a text string, type the string into the search field and then press Enter. The system searches the names of all objects in the workspace that you have access to. It also searches text in notebook commands, but not in non-notebook files. \nYou can place quotation marks around your search entry to narrow search results to only documents that contain your exact phrase. \nExact match search supports the following: \n* Basic quotation marks (for example, `\"spark.sql(\"`)\n* Escaped quotation marks (for example, `\"spark.sql(\\\"select\"`) \nExact match search doesn\u2019t support the following: \n* With quotation marks and without quotation marks (for example, `\"spark.sql\" partition`)\n* Multiple quotation marks (for example, `\"spark.sql\" \"partition\"`)\n\n#### Search for workspace objects\n##### Semantic search\n\nPreview \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \nYou can use natural language to search Unity Catalog tables. Search returns results that have related semantic meaning. \nFor example, the search query \u201cWhat should I use for geographies\u201d focuses on \u201cgeographies\u201d and finds related terms containing geographic attributes such as cities, countries, territories, and geo-locations. \nSearch can also understand patterns in your search queries by separating what might be a search term from a filter, which means that natural language queries are even more powerful. \nFor example, the search query \u201cShow me tables about inspections\u201d is broken down so that \u201cinspections\u201d is the key term and \u201ctable\u201d is the type of object the user is searching for.\n\n", "chunk_id": "57e498fc654750e303e68771cf081b42", "url": "https://docs.databricks.com/search/index.html"} +{"chunked_text": "# Get started: Account and workspace setup\n## Navigate the workspace\n#### Search for workspace objects\n##### Limit search to a specific object type\n\nYou can search for items by type (such as files, folders, notebooks, libraries, tables, or repos) by clicking the object type on the **Search results** page, either from the **Type** drop-down list or from the tabs on the right side of the page. A text string is not required. If you leave the text field blank and then press Enter, the system searches for all objects of that type. Click a name from the list to open that item in the workspace. You can also use dropdown menus to further narrow search results for items of a specific type, such as by owner or last-modified date. \nYou can also specify filters in your search query in the search bar at the top of the UI. For example, you can include the following in your search query to search for tables you own: `type:table owner:me`. To learn more about how to specify your filters via syntax, apply filters on the **Search results** page and see how the query in the search bar automatically updates.\n\n#### Search for workspace objects\n##### Popularity\n\nSearch uses popularity signals based on how often other users in your workspace are interacting with specific tables to improve how tables are ranked. \nWithout popularity boosting, you would have to query the tables returned in the search results to know which is the authoritative table. With popularity boosting, the most popular table is ranked higher so you don\u2019t have to guess which is the correct one. The popularity indicator ![Popularity indicator icon](https://docs.databricks.com/_images/popularity-indicator.png) next to the table name in the search results reflects object ranking. You can also sort search results by popularity.\n\n#### Search for workspace objects\n##### Knowledge cards\n\nWhen search can identify what you\u2019re looking for with high confidence, the top search result turns into a knowledge card. A knowledge card provides additional object metadata. Knowledge cards are supported for Unity Catalog managed tables. \n![Example knowledge card](https://docs.databricks.com/_images/knowledge-card.png)\n\n", "chunk_id": "56139bba260c3992c5b51657a0763f0f", "url": "https://docs.databricks.com/search/index.html"} +{"chunked_text": "# Get started: Account and workspace setup\n## Navigate the workspace\n#### Search for workspace objects\n##### Search tables and models in Unity Catalog-enabled workspaces\n\nIn workspaces [enabled for Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/enable-workspaces.html), you can search for tables and models registered in Unity Catalog. You can search on any of the following: \n* Table, view, or model names.\n* Table, view, or model comments.\n* Table or view column names.\n* Table or view column comments.\n* Table or view [tag keys](https://docs.databricks.com/search/index.html#tags). \nTo filter search results by parent catalog, parent schema, owner, or tag on the **Search results** page, click the **Type** drop-down menu and select **Tables**. The filter drop-down menus appear at the top of the page. \nYou can also sort the results by the table\u2019s popularity. \nSearch results don\u2019t include: \n* Tables, views, and models that you don\u2019t have permission to see. \nIn other words, for a table or model to appear in your search results, you must have at least the `SELECT` privilege on that table or `EXECUTE` privilege on the model, the `USE SCHEMA` privilege on its parent schema, and the `USE CATALOG` privilege on its parent catalog. Metastore admins have those privileges by default. All other users must be granted those privileges. See [Unity Catalog privileges and securable objects](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/privileges.html).\n* Tables and views in the legacy Hive metastore (that is, in the `hive_metastore` catalog). \nTo upgrade these tables to Unity Catalog and make them available for search, follow the instructions in [Upgrade Hive tables and views to Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/migrate.html).\n* Models in the workspace model registry. \nTo upgrade ML workflows to create models in Unity Catalog, see [Upgrade ML workflows to target models in Unity Catalog](https://docs.databricks.com/machine-learning/manage-model-lifecycle/upgrade-workflows.html). \n### Use tags to search for tables \nYou can use the Databricks workspace search bar to search for tables, views, and table columns using tag keys and tag values. You can also use tag keys to filter tables and views using workspace search. You cannot search for other tagged objects, like catalogs, schemas, or volumes. See also [Apply tags to Unity Catalog securable objects](https://docs.databricks.com/data-governance/unity-catalog/tags.html). \nOnly tables and views that you have permission to see appear in search results. \nTo search for tables, views, and columns using tags: \n1. Click the **Search** field in the top bar of the Databricks workspace or use the keyboard shortcut Command-P. \nYou cannot use the filter field in Catalog Explorer to search by tag.\n2. Enter your search criteria. Search for tagged tables or columns by entering the table or column tag key or value. You must use the exact tag key or value term. \nIf you want to search by tag key alone, use the syntax: `tag:`. To search by both tag key and tag value, omit `tag:`. \n![Search for tables by tag key](https://docs.databricks.com/_images/tag-search.png) \nTo filter table search results using tag keys: \n1. Click the **Search** field in the top bar of the Databricks workspace or use the keyboard shortcut Command-P.\n2. Enter a search term or leave the search field blank.\n3. On the **Search results** page, click the **Type** drop-down menu and select **Tables**.\n4. Use the **Tag** filter drop-down menu to select the tag key.\n\n", "chunk_id": "10fc7d4946722fb403c462d7258ca96b", "url": "https://docs.databricks.com/search/index.html"} +{"chunked_text": "# Technology partners\n### Connect to semantic layer partners using Partner Connect\n\nTo connect your Databricks workspace to a semantic layer partner solution using Partner Connect, you typically follow the steps in this article. \nImportant \nBefore you follow the steps in this article, see the appropriate partner article for important partner-specific information. There might be differences in the connection steps between partner solutions. Some partner solutions also allow you to integrate with Databricks SQL warehouses (formerly Databricks SQL endpoints) or Databricks clusters, but not both.\n\n### Connect to semantic layer partners using Partner Connect\n#### Requirements\n\nSee the [requirements](https://docs.databricks.com/partner-connect/index.html#requirements) for using Partner Connect. \nImportant \nFor partner-specific requirements, see the appropriate partner article.\n\n", "chunk_id": "8b305d53b0499e69f0baa1c090bdf6df", "url": "https://docs.databricks.com/partner-connect/semantic-layer.html"} +{"chunked_text": "# Technology partners\n### Connect to semantic layer partners using Partner Connect\n#### Steps to connect to a semantic layer partner\n\nTo connect your Databricks workspace to a semantic layer partner solution, follow the steps in this section. \nTip \nIf you have an existing partner account, Databricks recommends that you follow the steps to connect to the partner solution manually in the appropriate partner article. This is because the connection experience in Partner Connect is optimized for new partner accounts. \n1. In the sidebar, click ![Partner Connect button](https://docs.databricks.com/_images/partner-connect.png) **Partner Connect**.\n2. Click the partner tile. \nNote \nIf the partner tile has a check mark icon inside it, an administrator has already used Partner Connect to connect the partner to your workspace. Skip to step 8. The partner uses the email address for your Databricks account to prompt you to sign in to your existing partner account.\n3. If there are SQL warehouses in your workspace, select a SQL warehouse from the drop-down list. If your SQL warehouse is stopped, click **Start**.\n4. If there are no SQL warehouses in your workspace, do the following: \n1. Click **Create warehouse**. A new tab opens in your browser that displays the **New SQL Warehouse** page in the Databricks SQL UI.\n2. Follow the steps in [Create a SQL warehouse](https://docs.databricks.com/compute/sql-warehouse/create.html).\n3. Return to the Partner Connect tab in your browser, then close the partner tile.\n4. Re-open the partner tile.\n5. Select the SQL warehouse you just created from the drop-down list.\n5. Select a catalog and a schema from the drop-down lists, then click **Add**. You can repeat this step to add multiple schemas. \nNote \nIf your workspace is Unity Catalog-enabled, but the partner doesn\u2019t support Unity Catalog with Partner Connect, the workspace default catalog is used. If your workspace isn\u2019t Unity Catalog-enabled, `hive_metastore` is used.\n6. Click **Next**. \nPartner Connect creates the following resources in your workspace: \n* A Databricks [service principal](https://docs.databricks.com/admin/users-groups/service-principals.html) named **`_USER`**.\n* A Databricks [personal access token](https://docs.databricks.com/admin/users-groups/service-principals.html) that is associated with the **`_USER`** service principal.Partner Connect also grants the following privileges to the **`_USER`** service principal: \n* (Unity Catalog)`USE CATALOG`: Required to interact with objects within the selected catalog.\n* (Unity Catalog) `USE SCHEMA`: Required to interact with objects within the selected schema.\n* (Hive metastore) `USAGE`: Required to grant the `SELECT` and `READ METADATA` privileges for the schemas you selected.\n* `SELECT`: Grants the ability to read the schemas you selected.\n* (Hive metastore) `READ METADATA`: Grants the ability to read metadata for the schemas you selected.\n* **CAN\\_USE**: Grants permissions to use the SQL warehouse you selected.\n7. Click **Next**. \nThe **Email** box displays the email address for your Databricks account. The partner uses this email address to prompt you to either create a new partner account or sign in to your existing partner account.\n8. Click **Connect to ``** or **Sign in**. \nA new tab opens in your web browser, which displays the partner website.\n9. Complete the on-screen instructions on the partner website to create your trial partner account or sign in to your existing partner account.\n\n", "chunk_id": "5e6390b0c438a51caceb6789bdc401db", "url": "https://docs.databricks.com/partner-connect/semantic-layer.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks notebooks\n#### Share code between Databricks notebooks\n\nThis article describes how to use files to modularize your code, including how to create and import Python files. \nDatabricks also supports multi-task jobs which allow you to combine notebooks into workflows with complex dependencies. For more information, see [Create and run Databricks Jobs](https://docs.databricks.com/workflows/jobs/create-run-jobs.html).\n\n#### Share code between Databricks notebooks\n##### Modularize your code using files\n\nWith Databricks Runtime 11.3 LTS and above, you can create and manage source code files in the Databricks workspace, and then import these files into your notebooks as needed. You can also use a Databricks repo to sync your files with a Git repository. For details, see [Work with Python and R modules](https://docs.databricks.com/files/workspace-modules.html) and [Git integration with Databricks Git folders](https://docs.databricks.com/repos/index.html).\n\n#### Share code between Databricks notebooks\n##### Create a file\n\nTo create a file: \n1. [Navigate to a folder in the workspace](https://docs.databricks.com/workspace/workspace-objects.html).\n2. Right-click on the folder name and select **Create > File**.\n3. Enter a name for the file and click **Create File** or press **Enter**. The file opens in an editor window. Changes are saved automatically.\n\n#### Share code between Databricks notebooks\n##### Open a file\n\nNavigate to the file in your workspace and click on it. The file path displays when you hover over the name of the file.\n\n", "chunk_id": "b1308b8146927b0046cb185f83e7ebeb", "url": "https://docs.databricks.com/notebooks/share-code.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks notebooks\n#### Share code between Databricks notebooks\n##### Import a file into a notebook\n\nYou can import a file into a notebook using standard Python import commands: \nSuppose you have the following file: \n![file that defines functions](https://docs.databricks.com/_images/functions.png) \nYou can import that file into a notebook and call the functions defined in the file: \n![import file into notebook](https://docs.databricks.com/_images/call-functions.png)\n\n#### Share code between Databricks notebooks\n##### Run a file\n\nYou can run a file from the editor. This is useful for testing. To run a file, place your cursor in the code area and select **Shift + Enter** to run the cell, or highlight code in the cell and press **Shift + Ctrl + Enter** to run only the selected code.\n\n#### Share code between Databricks notebooks\n##### Delete a file\n\nSee [Folders](https://docs.databricks.com/workspace/workspace-objects.html#folders) and [Workspace object operations](https://docs.databricks.com/workspace/workspace-objects.html#objects) for information about how to access the workspace menu and delete files or other items in the workspace.\n\n#### Share code between Databricks notebooks\n##### Rename a file\n\nTo change the title of an open file, click the title and edit inline or click **File > Rename**.\n\n#### Share code between Databricks notebooks\n##### Control access to a file\n\nIf your Databricks account has the [Premium plan or above](https://databricks.com/product/pricing/platform-addons), you can use [Workspace access control](https://docs.databricks.com/security/auth-authz/access-control/index.html#files) to control who has access to a file.\n\n", "chunk_id": "4a35d52f8fc83906035169e7f680624a", "url": "https://docs.databricks.com/notebooks/share-code.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n\nImportant \nThis article documents the 2.0 version of the Jobs API. However, Databricks recommends using [Jobs API 2.1](https://docs.databricks.com/api/workspace/jobs) for new and existing clients and scripts. For details on the changes from the 2.0 to 2.1 versions, see [Updating from Jobs API 2.0 to 2.1](https://docs.databricks.com/workflows/jobs/jobs-api-updates.html). \nThe Jobs API allows you to create, edit, and delete jobs. The maximum allowed size of a request to the Jobs API is 10MB. \nFor details about updates to the Jobs API that support orchestration of multiple tasks with Databricks jobs, see [Updating from Jobs API 2.0 to 2.1](https://docs.databricks.com/workflows/jobs/jobs-api-updates.html). \nWarning \nYou should never hard code secrets or store them in plain text. Use the [Secrets API](https://docs.databricks.com/api/workspace/secrets) to manage secrets in the [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html). Use the [Secrets utility (dbutils.secrets)](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-secrets) to reference secrets in notebooks and jobs. \nNote \nIf you receive a 500-level error when making Jobs API requests, Databricks recommends retrying requests for up to 10 min (with a minimum 30 second interval between retries). \nImportant \nTo access Databricks REST APIs, you must [authenticate](https://docs.databricks.com/dev-tools/auth/index.html).\n\n", "chunk_id": "cb5f291e68876cffcc67aa3fc79de097", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Create\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/create` | `POST` | \nCreate a new job. \n### Example \nThis example creates a job that runs a JAR task at 10:15pm each night. \n#### Request \n```\ncurl --netrc --request POST \\\nhttps:///api/2.0/jobs/create \\\n--data @create-job.json \\\n| jq .\n\n``` \n`create-job.json`: \n```\n{\n\"name\": \"Nightly model training\",\n\"new_cluster\": {\n\"spark_version\": \"7.3.x-scala2.12\",\n\"node_type_id\": \"r3.xlarge\",\n\"aws_attributes\": {\n\"availability\": \"ON_DEMAND\"\n},\n\"num_workers\": 10\n},\n\"libraries\": [\n{\n\"jar\": \"dbfs:/my-jar.jar\"\n},\n{\n\"maven\": {\n\"coordinates\": \"org.jsoup:jsoup:1.7.2\"\n}\n}\n],\n\"email_notifications\": {\n\"on_start\": [],\n\"on_success\": [],\n\"on_failure\": []\n},\n\"webhook_notifications\": {\n\"on_start\": [\n{\n\"id\": \"bf2fbd0a-4a05-4300-98a5-303fc8132233\"\n}\n],\n\"on_success\": [\n{\n\"id\": \"bf2fbd0a-4a05-4300-98a5-303fc8132233\"\n}\n],\n\"on_failure\": []\n},\n\"notification_settings\": {\n\"no_alert_for_skipped_runs\": false,\n\"no_alert_for_canceled_runs\": false,\n\"alert_on_last_attempt\": false\n},\n\"timeout_seconds\": 3600,\n\"max_retries\": 1,\n\"schedule\": {\n\"quartz_cron_expression\": \"0 15 22 * * ?\",\n\"timezone_id\": \"America/Los_Angeles\"\n},\n\"spark_jar_task\": {\n\"main_class_name\": \"com.databricks.ComputeModels\"\n}\n}\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* The contents of `create-job.json` with fields that are appropriate for your solution. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n#### Response \n```\n{\n\"job_id\": 1\n}\n\n``` \n### Request structure \nImportant \n* When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.\n* When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `existing_cluster_id` OR `new_cluster` | `STRING` OR [NewCluster](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsclusterspecnewcluster) | If existing\\_cluster\\_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability. If new\\_cluster, a description of a cluster that will be created for each run. If specifying a [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobspipelinetask), this field can be empty. |\n| `notebook_task` OR `spark_jar_task` OR `spark_python_task` OR `spark_submit_task` OR `pipeline_task` OR `run_job_task` | [NotebookTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsnotebooktask) OR [SparkJarTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparkjartask) OR [SparkPythonTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparkpythontask) OR [SparkSubmitTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparksubmittask) OR [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobspipelinetask) OR [RunJobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunjobtask) | If notebook\\_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark\\_jar\\_task. If spark\\_jar\\_task, indicates that this job should run a JAR. If spark\\_python\\_task, indicates that this job should run a Python file. If spark\\_submit\\_task, indicates that this job should be launched by the spark submit script. If pipeline\\_task, indicates that this job should run a Delta Live Tables pipeline. If run\\_job\\_task, indicates that this job should run another job. |\n| `name` | `STRING` | An optional name for the job. The default value is `Untitled`. |\n| `libraries` | An array of [Library](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#managedlibrarieslibrary) | An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |\n| `email_notifications` | [JobEmailNotifications](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobemailnotifications) | An optional set of email addresses notified when runs of this job begin and complete and when this job is deleted. The default behavior is to not send any emails. |\n| `webhook_notifications` | [WebhookNotifications](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobsystemnotifications) | An optional set of system destinations to notify when runs of this job begin, complete, or fail. |\n| `notification_settings` | [JobNotificationSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobnotificationsettings) | Optional notification settings that are used when sending notifications to each of the `email_notifications` and `webhook_notifications` for this job. |\n| `timeout_seconds` | `INT32` | An optional timeout applied to each run of this job. The default behavior is to have no timeout. |\n| `max_retries` | `INT32` | An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with the `FAILED` result\\_state or `INTERNAL_ERROR` `life_cycle_state`. The value -1 means to retry indefinitely and the value 0 means to never retry. The default behavior is to never retry. |\n| `min_retry_interval_millis` | `INT32` | An optional minimal interval in milliseconds between the start of the failed run and the subsequent retry run. The default behavior is that unsuccessful runs are immediately retried. |\n| `retry_on_timeout` | `BOOL` | An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout. |\n| `schedule` | [CronSchedule](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobscronschedule) | An optional periodic schedule for this job. The default behavior is that the job runs when triggered by clicking **Run Now** in the Jobs UI or sending an API request to `runNow`. |\n| `max_concurrent_runs` | `INT32` | An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job\u2019s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won\u2019t kill any of the active runs. However, from then on, new runs are skipped unless there are fewer than 3 active runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run. | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier for the newly created job. |\n\n", "chunk_id": "aea8502364431a778b7b31c1b7512651", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### List\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/list` | `GET` | \nList all jobs. \n### Example \n#### Request \n```\ncurl --netrc --request GET \\\nhttps:///api/2.0/jobs/list \\\n| jq .\n\n``` \nReplace `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n#### Response \n```\n{\n\"jobs\": [\n{\n\"job_id\": 1,\n\"settings\": {\n\"name\": \"Nightly model training\",\n\"new_cluster\": {\n\"spark_version\": \"7.3.x-scala2.12\",\n\"node_type_id\": \"r3.xlarge\",\n\"aws_attributes\": {\n\"availability\": \"ON_DEMAND\"\n},\n\"num_workers\": 10\n},\n\"libraries\": [\n{\n\"jar\": \"dbfs:/my-jar.jar\"\n},\n{\n\"maven\": {\n\"coordinates\": \"org.jsoup:jsoup:1.7.2\"\n}\n}\n],\n\"email_notifications\": {\n\"on_start\": [],\n\"on_success\": [],\n\"on_failure\": []\n},\n\"timeout_seconds\": 100000000,\n\"max_retries\": 1,\n\"schedule\": {\n\"quartz_cron_expression\": \"0 15 22 * * ?\",\n\"timezone_id\": \"America/Los_Angeles\",\n\"pause_status\": \"UNPAUSED\"\n},\n\"spark_jar_task\": {\n\"main_class_name\": \"com.databricks.ComputeModels\"\n}\n},\n\"created_time\": 1457570074236\n}\n]\n}\n\n``` \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `jobs` | An array of [Job](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjob) | The list of jobs. |\n\n", "chunk_id": "1f78b7d353a737e8783d63b5b035f772", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Delete\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/delete` | `POST` | \nDelete a job and send an email to the addresses specified in `JobSettings.email_notifications`. No action occurs if the job has already been removed. After the job is removed, neither its details nor its run history is visible in the Jobs UI or API. The job is guaranteed to be removed upon completion of this request. However, runs that were active before the receipt of this request may still be active. They will be terminated asynchronously. \n### Example \n```\ncurl --netrc --request POST \\\nhttps:///api/2.0/jobs/delete \\\n--data '{ \"job_id\": }'\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* `` with the ID of the job, for example `123`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier of the job to delete. This field is required. |\n\n", "chunk_id": "0bb7e6cec57f673695fe1c53e2decc05", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Get\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/get` | `GET` | \nRetrieve information about a single job. \n### Example \n#### Request \n```\ncurl --netrc --request GET \\\n'https:///api/2.0/jobs/get?job_id=' \\\n| jq .\n\n``` \nOr: \n```\ncurl --netrc --get \\\nhttps:///api/2.0/jobs/get \\\n--data job_id= \\\n| jq .\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* `` with the ID of the job, for example `123`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n#### Response \n```\n{\n\"job_id\": 1,\n\"settings\": {\n\"name\": \"Nightly model training\",\n\"new_cluster\": {\n\"spark_version\": \"7.3.x-scala2.12\",\n\"node_type_id\": \"r3.xlarge\",\n\"aws_attributes\": {\n\"availability\": \"ON_DEMAND\"\n},\n\"num_workers\": 10\n},\n\"libraries\": [\n{\n\"jar\": \"dbfs:/my-jar.jar\"\n},\n{\n\"maven\": {\n\"coordinates\": \"org.jsoup:jsoup:1.7.2\"\n}\n}\n],\n\"email_notifications\": {\n\"on_start\": [],\n\"on_success\": [],\n\"on_failure\": []\n},\n\"webhook_notifications\": {\n\"on_start\": [\n{\n\"id\": \"bf2fbd0a-4a05-4300-98a5-303fc8132233\"\n}\n],\n\"on_success\": [\n{\n\"id\": \"bf2fbd0a-4a05-4300-98a5-303fc8132233\"\n}\n],\n\"on_failure\": []\n},\n\"notification_settings\": {\n\"no_alert_for_skipped_runs\": false,\n\"no_alert_for_canceled_runs\": false,\n\"alert_on_last_attempt\": false\n},\n\"timeout_seconds\": 100000000,\n\"max_retries\": 1,\n\"schedule\": {\n\"quartz_cron_expression\": \"0 15 22 * * ?\",\n\"timezone_id\": \"America/Los_Angeles\",\n\"pause_status\": \"UNPAUSED\"\n},\n\"spark_jar_task\": {\n\"main_class_name\": \"com.databricks.ComputeModels\"\n}\n},\n\"created_time\": 1457570074236\n}\n\n``` \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier of the job to retrieve information about. This field is required. | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier for this job. |\n| `creator_user_name` | `STRING` | The creator user name. This field won\u2019t be included in the response if the user has been deleted. |\n| `settings` | [JobSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettings) | Settings for this job and all of its runs. These settings can be updated using the [Reset](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsserviceresetjob) or [Update](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsserviceupdatejob) endpoints. |\n| `created_time` | `INT64` | The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC). |\n\n", "chunk_id": "d39d44b691f15f70d18fc3f5eaa40ec2", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Reset\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/reset` | `POST` | \nOverwrite all settings for a specific job. Use the [Update](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsserviceupdatejob) endpoint to update job settings partially. \n### Example \nThis example request makes job 2 identical to job 1 in the [create](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsservicecreatejob) example. \n```\ncurl --netrc --request POST \\\nhttps:///api/2.0/jobs/reset \\\n--data @reset-job.json \\\n| jq .\n\n``` \n`reset-job.json`: \n```\n{\n\"job_id\": 2,\n\"new_settings\": {\n\"name\": \"Nightly model training\",\n\"new_cluster\": {\n\"spark_version\": \"7.3.x-scala2.12\",\n\"node_type_id\": \"r3.xlarge\",\n\"aws_attributes\": {\n\"availability\": \"ON_DEMAND\"\n},\n\"num_workers\": 10\n},\n\"libraries\": [\n{\n\"jar\": \"dbfs:/my-jar.jar\"\n},\n{\n\"maven\": {\n\"coordinates\": \"org.jsoup:jsoup:1.7.2\"\n}\n}\n],\n\"email_notifications\": {\n\"on_start\": [],\n\"on_success\": [],\n\"on_failure\": []\n},\n\"webhook_notifications\": {\n\"on_start\": [\n{\n\"id\": \"bf2fbd0a-4a05-4300-98a5-303fc8132233\"\n}\n],\n\"on_success\": [\n{\n\"id\": \"bf2fbd0a-4a05-4300-98a5-303fc8132233\"\n}\n],\n\"on_failure\": []\n},\n\"notification_settings\": {\n\"no_alert_for_skipped_runs\": false,\n\"no_alert_for_canceled_runs\": false,\n\"alert_on_last_attempt\": false\n},\n\"timeout_seconds\": 100000000,\n\"max_retries\": 1,\n\"schedule\": {\n\"quartz_cron_expression\": \"0 15 22 * * ?\",\n\"timezone_id\": \"America/Los_Angeles\",\n\"pause_status\": \"UNPAUSED\"\n},\n\"spark_jar_task\": {\n\"main_class_name\": \"com.databricks.ComputeModels\"\n}\n}\n}\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* The contents of `reset-job.json` with fields that are appropriate for your solution. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier of the job to reset. This field is required. |\n| `new_settings` | [JobSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettings) | The new settings of the job. These settings completely replace the old settings. Changes to the field `JobSettings.timeout_seconds` are applied to active runs. Changes to other fields are applied to future runs only. |\n\n", "chunk_id": "3ef3d8c002a2eae330e75d4954625f58", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Update\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/update` | `POST` | \nAdd, change, or remove specific settings of an existing job. Use the [Reset](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsserviceresetjob) endpoint to overwrite all job settings. \n### Example \nThis example request removes libraries and adds email notification settings to job 1 defined in the [create](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsservicecreatejob) example. \n```\ncurl --netrc --request POST \\\nhttps:///api/2.0/jobs/update \\\n--data @update-job.json \\\n| jq .\n\n``` \n`update-job.json`: \n```\n{\n\"job_id\": 1,\n\"new_settings\": {\n\"existing_cluster_id\": \"1201-my-cluster\",\n\"email_notifications\": {\n\"on_start\": [ \"someone@example.com\" ],\n\"on_success\": [],\n\"on_failure\": []\n}\n},\n\"fields_to_remove\": [\"libraries\"]\n}\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* The contents of `update-job.json` with fields that are appropriate for your solution. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier of the job to update. This field is required. |\n| `new_settings` | [JobSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettings) | The new settings for the job. Top-level fields specified in `new_settings`, except for arrays, are completely replaced. Arrays are merged based on the respective key fields, such as `task_key` or `job_cluster_key`, and array entries with the same key are completely replaced. Except for array merging, partially updating nested fields is not supported. Changes to the field `JobSettings.timeout_seconds` are applied to active runs. Changes to other fields are applied to future runs only. |\n| `fields_to_remove` | An array of `STRING` | Remove top-level fields in the job settings. Removing nested fields is not supported, except for entries from the `tasks` and `job_clusters` arrays. For example, the following is a valid argument for this field: `[\"libraries\", \"schedule\", \"tasks/task_1\", \"job_clusters/Default\"]` This field is optional. |\n\n", "chunk_id": "3b74d215ac578f2a2a5d595d8356f141", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Run now\n\nImportant \n* A workspace is limited to 1000 concurrent task runs. A `429 Too Many Requests` response is returned when you request a run that cannot start immediately.\n* The number of jobs a workspace can create in an hour is limited to 10000 (includes \u201cruns submit\u201d). This limit also affects jobs created by the REST API and notebook workflows. \n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/run-now` | `POST` | \nRun a job now and return the `run_id` of the triggered run. \nTip \nIf you invoke [Create](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsservicecreatejob) together with [Run now](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsservicerunnow), you can use the\n[Runs submit](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsservicesubmitrun) endpoint instead, which allows you to submit your workload directly without having to create a job. \n### Example \n```\ncurl --netrc --request POST \\\nhttps:///api/2.0/jobs/run-now \\\n--data @run-job.json \\\n| jq .\n\n``` \n`run-job.json`: \nAn example request for a notebook job: \n```\n{\n\"job_id\": 1,\n\"notebook_params\": {\n\"name\": \"john doe\",\n\"age\": \"35\"\n}\n}\n\n``` \nAn example request for a JAR job: \n```\n{\n\"job_id\": 2,\n\"jar_params\": [ \"john doe\", \"35\" ]\n}\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* The contents of `run-job.json` with fields that are appropriate for your solution. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | |\n| `jar_params` | An array of `STRING` | A list of parameters for jobs with JAR tasks, e.g. `\"jar_params\": [\"john doe\", \"35\"]`. The parameters will be used to invoke the main function of the main class specified in the Spark JAR task. If not specified upon `run-now`, it will default to an empty list. jar\\_params cannot be specified in conjunction with notebook\\_params. The JSON representation of this field (i.e. `{\"jar_params\":[\"john doe\",\"35\"]}`) cannot exceed 10,000 bytes. |\n| `notebook_params` | A map of [ParamPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsparampair) | A map from keys to values for jobs with notebook task, e.g. `\"notebook_params\": {\"name\": \"john doe\", \"age\": \"35\"}`. The map is passed to the notebook and is accessible through the [dbutils.widgets.get](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets) function. If not specified upon `run-now`, the triggered run uses the job\u2019s base parameters. You cannot specify notebook\\_params in conjunction with jar\\_params. The JSON representation of this field (i.e. `{\"notebook_params\":{\"name\":\"john doe\",\"age\":\"35\"}}`) cannot exceed 10,000 bytes. |\n| `python_params` | An array of `STRING` | A list of parameters for jobs with Python tasks, e.g. `\"python_params\": [\"john doe\", \"35\"]`. The parameters will be passed to Python file as command-line parameters. If specified upon `run-now`, it would overwrite the parameters specified in job setting. The JSON representation of this field (i.e. `{\"python_params\":[\"john doe\",\"35\"]}`) cannot exceed 10,000 bytes. |\n| `spark_submit_params` | An array of `STRING` | A list of parameters for jobs with spark submit task, e.g. `\"spark_submit_params\": [\"--class\", \"org.apache.spark.examples.SparkPi\"]`. The parameters will be passed to spark-submit script as command-line parameters. If specified upon `run-now`, it would overwrite the parameters specified in job setting. The JSON representation of this field cannot exceed 10,000 bytes. |\n| `idempotency_token` | `STRING` | An optional token to guarantee the idempotency of job run requests. If a run with the provided token already exists, the request does not create a new run but returns the ID of the existing run instead. If a run with the provided token is deleted, an error is returned. If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one run is launched with that idempotency token. This token must have at most 64 characters. For more information, see [How to ensure idempotency for jobs](https://kb.databricks.com/jobs/jobs-idempotency.html). | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `run_id` | `INT64` | The globally unique ID of the newly triggered run. |\n| `number_in_job` | `INT64` | The sequence number of this run among all runs of the job. |\n\n", "chunk_id": "f6fe78afa866105ef63f7ba708ffb36a", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Runs submit\n\nImportant \n* A workspace is limited to 1000 concurrent task runs. A `429 Too Many Requests` response is returned when you request a run that cannot start immediately.\n* The number of jobs a workspace can create in an hour is limited to 10000 (includes \u201cruns submit\u201d). This limit also affects jobs created by the REST API and notebook workflows. \n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/runs/submit` | `POST` | \nSubmit a one-time run. This endpoint allows you to submit a workload directly without creating a job. Use the `jobs/runs/get` API to check the run state after the job is submitted. \n### Example \n#### Request \n```\ncurl --netrc --request POST \\\nhttps:///api/2.0/jobs/runs/submit \\\n--data @submit-job.json \\\n| jq .\n\n``` \n`submit-job.json`: \n```\n{\n\"run_name\": \"my spark task\",\n\"new_cluster\": {\n\"spark_version\": \"7.3.x-scala2.12\",\n\"node_type_id\": \"r3.xlarge\",\n\"aws_attributes\": {\n\"availability\": \"ON_DEMAND\"\n},\n\"num_workers\": 10\n},\n\"libraries\": [\n{\n\"jar\": \"dbfs:/my-jar.jar\"\n},\n{\n\"maven\": {\n\"coordinates\": \"org.jsoup:jsoup:1.7.2\"\n}\n}\n],\n\"spark_jar_task\": {\n\"main_class_name\": \"com.databricks.ComputeModels\"\n}\n}\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* The contents of `submit-job.json` with fields that are appropriate for your solution. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n#### Response \n```\n{\n\"run_id\": 123\n}\n\n``` \n### Request structure \nImportant \n* When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.\n* When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `existing_cluster_id` OR `new_cluster` | `STRING` OR [NewCluster](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsclusterspecnewcluster) | If existing\\_cluster\\_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability. If new\\_cluster, a description of a cluster that will be created for each run. If specifying a [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobspipelinetask), then this field can be empty. |\n| `notebook_task` OR `spark_jar_task` OR `spark_python_task` OR `spark_submit_task` OR `pipeline_task` OR `run_job_task` | [NotebookTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsnotebooktask) OR [SparkJarTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparkjartask) OR [SparkPythonTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparkpythontask) OR [SparkSubmitTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparksubmittask) OR [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobspipelinetask) OR [RunJobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunjobtask) | If notebook\\_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark\\_jar\\_task. If spark\\_jar\\_task, indicates that this job should run a JAR. If spark\\_python\\_task, indicates that this job should run a Python file. If spark\\_submit\\_task, indicates that this job should be launched by the spark submit script. If pipeline\\_task, indicates that this job should run a Delta Live Tables pipeline. If run\\_job\\_task, indicates that this job should run another job. |\n| `run_name` | `STRING` | An optional name for the run. The default value is `Untitled`. |\n| `webhook_notifications` | [WebhookNotifications](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobsystemnotifications) | An optional set of system destinations to notify when runs of this job begin, complete, or fail. |\n| `notification_settings` | [JobNotificationSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobnotificationsettings) | Optional notification settings that are used when sending notifications to each of the `webhook_notifications` for this run. |\n| `libraries` | An array of [Library](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#managedlibrarieslibrary) | An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |\n| `timeout_seconds` | `INT32` | An optional timeout applied to each run of this job. The default behavior is to have no timeout. |\n| `idempotency_token` | `STRING` | An optional token to guarantee the idempotency of job run requests. If a run with the provided token already exists, the request does not create a new run but returns the ID of the existing run instead. If a run with the provided token is deleted, an error is returned. If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one run is launched with that idempotency token. This token must have at most 64 characters. For more information, see [How to ensure idempotency for jobs](https://kb.databricks.com/jobs/jobs-idempotency.html). | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `run_id` | `INT64` | The canonical identifier for the newly submitted run. |\n\n", "chunk_id": "f6885428d4240a31c4de128db7e52c4b", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Runs list\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/runs/list` | `GET` | \nList runs in descending order by start time. \nNote \nRuns are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see [Export job run results](https://docs.databricks.com/workflows/jobs/monitor-job-runs.html#export-job-runs). To export using the Jobs API, see [Runs export](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsserviceexportrun). \n### Example \n#### Request \n```\ncurl --netrc --request GET \\\n'https:///api/2.0/jobs/runs/list?job_id=&active_only=&offset=&limit=&run_type=' \\\n| jq .\n\n``` \nOr: \n```\ncurl --netrc --get \\\nhttps:///api/2.0/jobs/runs/list \\\n--data 'job_id=&active_only=&offset=&limit=&run_type=' \\\n| jq .\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* `` with the ID of the job, for example `123`.\n* `` with `true` or `false`.\n* `` with the `offset` value.\n* `` with the `limit` value.\n* `` with the `run_type` value. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n#### Response \n```\n{\n\"runs\": [\n{\n\"job_id\": 1,\n\"run_id\": 452,\n\"number_in_job\": 5,\n\"state\": {\n\"life_cycle_state\": \"RUNNING\",\n\"state_message\": \"Performing action\"\n},\n\"task\": {\n\"notebook_task\": {\n\"notebook_path\": \"/Users/donald@duck.com/my-notebook\"\n}\n},\n\"cluster_spec\": {\n\"existing_cluster_id\": \"1201-my-cluster\"\n},\n\"cluster_instance\": {\n\"cluster_id\": \"1201-my-cluster\",\n\"spark_context_id\": \"1102398-spark-context-id\"\n},\n\"overriding_parameters\": {\n\"jar_params\": [\"param1\", \"param2\"]\n},\n\"start_time\": 1457570074236,\n\"end_time\": 1457570075149,\n\"setup_duration\": 259754,\n\"execution_duration\": 3589020,\n\"cleanup_duration\": 31038,\n\"run_duration\": 3879812,\n\"trigger\": \"PERIODIC\"\n}\n],\n\"has_more\": true\n}\n\n``` \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `active_only` OR `completed_only` | `BOOL` OR `BOOL` | If active\\_only is `true`, only active runs are included in the results; otherwise, lists both active and completed runs. An active run is a run in the `PENDING`, `RUNNING`, or `TERMINATING` [RunLifecycleState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunlifecyclestate). This field cannot be `true` when completed\\_only is `true`. If completed\\_only is `true`, only completed runs are included in the results; otherwise, lists both active and completed runs. This field cannot be `true` when active\\_only is `true`. |\n| `job_id` | `INT64` | The job for which to list runs. If omitted, the Jobs service will list runs from all jobs. |\n| `offset` | `INT32` | The offset of the first run to return, relative to the most recent run. |\n| `limit` | `INT32` | The number of runs to return. This value should be greater than 0 and less than 1000. The default value is 20. If a request specifies a limit of 0, the service will instead use the maximum limit. |\n| `run_type` | `STRING` | The type of runs to return. For a description of run types, see [Run](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrun). | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `runs` | An array of [Run](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrun) | A list of runs, from most recently started to least. |\n| `has_more` | `BOOL` | If true, additional runs matching the provided filter are available for listing. |\n\n", "chunk_id": "f011476c8673b923e338d8fd857d43f1", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Runs get\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/runs/get` | `GET` | \nRetrieve the metadata of a run. \nNote \nRuns are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see [Export job run results](https://docs.databricks.com/workflows/jobs/monitor-job-runs.html#export-job-runs). To export using the Jobs API, see [Runs export](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsserviceexportrun). \n### Example \n#### Request \n```\ncurl --netrc --request GET \\\n'https:///api/2.0/jobs/runs/get?run_id=' \\\n| jq .\n\n``` \nOr: \n```\ncurl --netrc --get \\\nhttps:///api/2.0/jobs/runs/get \\\n--data run_id= \\\n| jq .\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* `` with the ID of the run, for example `123`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n#### Response \n```\n{\n\"job_id\": 1,\n\"run_id\": 452,\n\"number_in_job\": 5,\n\"state\": {\n\"life_cycle_state\": \"RUNNING\",\n\"state_message\": \"Performing action\"\n},\n\"task\": {\n\"notebook_task\": {\n\"notebook_path\": \"/Users/someone@example.com/my-notebook\"\n}\n},\n\"cluster_spec\": {\n\"existing_cluster_id\": \"1201-my-cluster\"\n},\n\"cluster_instance\": {\n\"cluster_id\": \"1201-my-cluster\",\n\"spark_context_id\": \"1102398-spark-context-id\"\n},\n\"overriding_parameters\": {\n\"jar_params\": [\"param1\", \"param2\"]\n},\n\"start_time\": 1457570074236,\n\"end_time\": 1457570075149,\n\"setup_duration\": 259754,\n\"execution_duration\": 3589020,\n\"cleanup_duration\": 31038,\n\"run_duration\": 3879812,\n\"trigger\": \"PERIODIC\"\n}\n\n``` \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `run_id` | `INT64` | The canonical identifier of the run for which to retrieve the metadata. This field is required. | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier of the job that contains this run. |\n| `run_id` | `INT64` | The canonical identifier of the run. This ID is unique across all runs of all jobs. |\n| `number_in_job` | `INT64` | The sequence number of this run among all runs of the job. This value starts at 1. |\n| `original_attempt_run_id` | `INT64` | If this run is a retry of a prior run attempt, this field contains the run\\_id of the original attempt; otherwise, it is the same as the run\\_id. |\n| `state` | [RunState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunstate) | The result and lifecycle states of the run. |\n| `schedule` | [CronSchedule](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobscronschedule) | The cron schedule that triggered this run if it was triggered by the periodic scheduler. |\n| `task` | [JobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobtask) | The task performed by the run, if any. |\n| `cluster_spec` | [ClusterSpec](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsclusterspec) | A snapshot of the job\u2019s cluster specification when this run was created. |\n| `cluster_instance` | [ClusterInstance](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsclusterinstance) | The cluster used for this run. If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run. |\n| `overriding_parameters` | [RunParameters](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunparameters) | The parameters used for this run. |\n| `start_time` | `INT64` | The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued. |\n| `end_time` | `INT64` | The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field will be set to 0 if the job is still running. |\n| `setup_duration` | `INT64` | The time in milliseconds it took to set up the cluster. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. The total duration of the run is the sum of the `setup_duration`, `execution_duration`, and the `cleanup_duration`. The `setup_duration` field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the `run_duration` field. |\n| `execution_duration` | `INT64` | The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. The total duration of the run is the sum of the `setup_duration`, `execution_duration`, and the `cleanup_duration`. The `execution_duration` field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the `run_duration` field. |\n| `cleanup_duration` | `INT64` | The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The total duration of the run is the sum of the `setup_duration`, `execution_duration`, and the `cleanup_duration`. The `cleanup_duration` field is set to 0 for multitask job runs. The total duration of a multitask job run is the value of the `run_duration` field. |\n| `run_duration` | `INT64` | The time in milliseconds it took the job run and all of its repairs to finish. This field is only set for multitask job runs and not task runs. The duration of a task run is the sum of the `setup_duration`, `execution_duration`, and the `cleanup_duration`. |\n| `trigger` | [TriggerType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobstriggertype) | The type of trigger that fired this run. |\n| `creator_user_name` | `STRING` | The creator user name. This field won\u2019t be included in the response if the user has been deleted |\n| `run_page_url` | `STRING` | The URL to the detail page of the run. |\n\n", "chunk_id": "2f8c99b4c3d05dda0dde0b0371f33810", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Runs export\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/runs/export` | `GET` | \nExport and retrieve the job run task. \nNote \nOnly notebook runs can be exported in HTML format. Exporting runs of other types will fail. \n### Example \n#### Request \n```\ncurl --netrc --request GET \\\n'https:///api/2.0/jobs/runs/export?run_id=' \\\n| jq .\n\n``` \nOr: \n```\ncurl --netrc --get \\\nhttps:///api/2.0/jobs/runs/export \\\n--data run_id= \\\n| jq .\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* `` with the ID of the run, for example `123`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n#### Response \n```\n{\n\"views\": [ {\n\"content\": \"HeadBody\",\n\"name\": \"my-notebook\",\n\"type\": \"NOTEBOOK\"\n} ]\n}\n\n``` \nTo extract the HTML notebook from the JSON response, download and run this [Python script](https://docs.databricks.com/_static/examples/extract.py). \nNote \nThe notebook body in the `__DATABRICKS_NOTEBOOK_MODEL` object is encoded. \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `run_id` | `INT64` | The canonical identifier for the run. This field is required. |\n| `views_to_export` | [ViewsToExport](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsviewstoexport) | Which views to export (CODE, DASHBOARDS, or ALL). Defaults to CODE. | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `views` | An array of [ViewItem](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsviewitem) | The exported content in HTML format (one for every view item). |\n\n", "chunk_id": "b8bf6591f4dbb2975495f799eb4d3db1", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Runs cancel\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/runs/cancel` | `POST` | \nCancel a job run. Because the run is canceled asynchronously, the run may still be running when this request completes. The run will be terminated shortly. If the run is already in a terminal `life_cycle_state`, this method is a no-op. \nThis endpoint validates that the `run_id` parameter is valid and for invalid parameters returns HTTP status code 400. \n### Example \n```\ncurl --netrc --request POST \\\nhttps:///api/2.0/jobs/runs/cancel \\\n--data '{ \"run_id\": }'\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* `` with the ID of the run, for example `123`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `run_id` | `INT64` | The canonical identifier of the run to cancel. This field is required. |\n\n", "chunk_id": "e5b813e44f4a61cce1c192c49cf0fc10", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Runs cancel all\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/runs/cancel-all` | `POST` | \nCancel all active runs of a job. Because the run is canceled asynchronously, it doesn\u2019t prevent new runs from being started. \nThis endpoint validates that the `job_id` parameter is valid and for invalid parameters returns HTTP status code 400. \n### Example \n```\ncurl --netrc --request POST \\\nhttps:///api/2.0/jobs/runs/cancel-all \\\n--data '{ \"job_id\": }'\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* `` with the ID of the job, for example `123`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier of the job to cancel all runs of. This field is required. |\n\n", "chunk_id": "e282cd3aff61fac05606ee2f8482b476", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Runs get output\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/runs/get-output` | `GET` | \nRetrieve the output and metadata of a single task run. When a notebook task returns a value through the [dbutils.notebook.exit()](https://docs.databricks.com/notebooks/notebook-workflows.html#notebook-workflows-exit)\ncall, you can use this endpoint to retrieve that value. Databricks restricts this API to return the first 5 MB of the output. For returning a larger result, you can store job results in a cloud storage service. \nThis endpoint validates that the `run_id` parameter is valid and for invalid parameters returns HTTP status code 400. \nRuns are automatically removed after 60 days. If you to want to reference them beyond 60 days, you should save old run results before they expire. To export using the UI, see [Export job run results](https://docs.databricks.com/workflows/jobs/monitor-job-runs.html#export-job-runs). To export using the Jobs API, see [Runs export](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsserviceexportrun). \n### Example \n#### Request \n```\ncurl --netrc --request GET \\\n'https:///api/2.0/jobs/runs/get-output?run_id=' \\\n| jq .\n\n``` \nOr: \n```\ncurl --netrc --get \\\nhttps:///api/2.0/jobs/runs/get-output \\\n--data run_id= \\\n| jq .\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* `` with the ID of the run, for example `123`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file and [jq](https://stedolan.github.io/jq/). \n#### Response \n```\n{\n\"metadata\": {\n\"job_id\": 1,\n\"run_id\": 452,\n\"number_in_job\": 5,\n\"state\": {\n\"life_cycle_state\": \"TERMINATED\",\n\"result_state\": \"SUCCESS\",\n\"state_message\": \"\"\n},\n\"task\": {\n\"notebook_task\": {\n\"notebook_path\": \"/Users/someone@example.com/my-notebook\"\n}\n},\n\"cluster_spec\": {\n\"existing_cluster_id\": \"1201-my-cluster\"\n},\n\"cluster_instance\": {\n\"cluster_id\": \"1201-my-cluster\",\n\"spark_context_id\": \"1102398-spark-context-id\"\n},\n\"overriding_parameters\": {\n\"jar_params\": [\"param1\", \"param2\"]\n},\n\"start_time\": 1457570074236,\n\"setup_duration\": 259754,\n\"execution_duration\": 3589020,\n\"cleanup_duration\": 31038,\n\"run_duration\": 3879812,\n\"trigger\": \"PERIODIC\"\n},\n\"notebook_output\": {\n\"result\": \"the maybe truncated string passed to dbutils.notebook.exit()\"\n}\n}\n\n``` \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `run_id` | `INT64` | The canonical identifier for the run. For a job with mulitple tasks, this is the `run_id` of a task run. See [Runs get output](https://docs.databricks.com/workflows/jobs/jobs-api-updates.html#get-runs-output). This field is required. | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `notebook_output` OR `error` | [NotebookOutput](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsnotebooktasknotebookoutput) OR `STRING` | If notebook\\_output, the output of a notebook task, if available. A notebook task that terminates (either successfully or with a failure) without calling `dbutils.notebook.exit()` is considered to have an empty output. This field will be set but its result value will be empty. If error, an error message indicating why output is not available. The message is unstructured, and its exact format is subject to change. |\n| `metadata` | [Run](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrun) | All details of the run except for its output. |\n\n", "chunk_id": "39a3f7f01f271ade23b4e8d2b03aee0c", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Runs delete\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/jobs/runs/delete` | `POST` | \nDelete a non-active run. Returns an error if the run is active. \n### Example \n```\ncurl --netrc --request POST \\\nhttps:///api/2.0/jobs/runs/delete \\\n--data '{ \"run_id\": }'\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`.\n* `` with the ID of the run, for example `123`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `run_id` | `INT64` | The canonical identifier of the run for which to retrieve the metadata. |\n\n", "chunk_id": "e36fee9ef999d957835ad89b8479c64e", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Databricks data engineering\n## Introduction to Databricks Workflows\n#### Jobs API 2.0\n##### Data structures\n\nIn this section: \n* [AutoScale](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#autoscale)\n* [AwsAttributes](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#awsattributes)\n* [AwsAvailability](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#awsavailability)\n* [ClusterInstance](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterinstance)\n* [ClusterLogConf](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterlogconf)\n* [ClusterSpec](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterspec)\n* [ClusterTag](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clustertag)\n* [CronSchedule](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#cronschedule)\n* [DbfsStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#dbfsstorageinfo)\n* [EbsVolumeType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#ebsvolumetype)\n* [FileStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#filestorageinfo)\n* [InitScriptInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#initscriptinfo)\n* [Job](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#job)\n* [JobEmailNotifications](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobemailnotifications)\n* [JobNotificationSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobnotificationsettings)\n* [JobSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsettings)\n* [JobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobtask)\n* [JobsHealthRule](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobshealthrule)\n* [JobsHealthRules](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobshealthrules)\n* [Library](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#library)\n* [MavenLibrary](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#mavenlibrary)\n* [NewCluster](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#newcluster)\n* [NotebookOutput](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#notebookoutput)\n* [NotebookTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#notebooktask)\n* [ParamPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#parampair)\n* [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#pipelinetask)\n* [PythonPyPiLibrary](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#pythonpypilibrary)\n* [RCranLibrary](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#rcranlibrary)\n* [Run](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#run)\n* [RunJobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#runjobtask)\n* [RunLifeCycleState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#runlifecyclestate)\n* [RunParameters](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#runparameters)\n* [RunResultState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#runresultstate)\n* [RunState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#runstate)\n* [S3StorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#s3storageinfo)\n* [SparkConfPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#sparkconfpair)\n* [SparkEnvPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#sparkenvpair)\n* [SparkJarTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#sparkjartask)\n* [SparkPythonTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#sparkpythontask)\n* [SparkSubmitTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#sparksubmittask)\n* [TriggerType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#triggertype)\n* [ViewItem](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#viewitem)\n* [ViewType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#viewtype)\n* [ViewsToExport](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#viewstoexport)\n* [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#webhook)\n* [WebhookNotifications](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#webhooknotifications)\n* [WorkspaceStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#workspacestorageinfo) \n### [AutoScale](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id50) \nRange defining the min and max number of cluster workers. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `min_workers` | `INT32` | The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation. |\n| `max_workers` | `INT32` | The maximum number of workers to which the cluster can scale up when overloaded. max\\_workers must be strictly greater than min\\_workers. | \n### [AwsAttributes](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id51) \nAttributes set during cluster creation related to Amazon Web Services. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `first_on_demand` | `INT32` | The first first\\_on\\_demand nodes of the cluster will be placed on on-demand instances. If this value is greater than 0, the cluster driver node will be placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first\\_on\\_demand nodes will be placed on on-demand instances and the remainder will be placed on `availability` instances. This value does not affect cluster size and cannot be mutated over the lifetime of a cluster. |\n| `availability` | [AwsAvailability](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterawsavailability) | Availability type used for all subsequent nodes past the first\\_on\\_demand ones. **Note:** If first\\_on\\_demand is zero, this availability type will be used for the entire cluster. |\n| `zone_id` | `STRING` | Identifier for the availability zone (AZ) in which the cluster resides. By default, the setting has a value of **auto**, otherwise known as Auto-AZ. With Auto-AZ, Databricks selects the AZ based on available IPs in the workspace subnets and retries in other availability zones if AWS returns insufficient capacity errors. If you want, you can also specify an availability zone to use. This benefits accounts that have reserved instances in a specific AZ. Specify the AZ as a string (for example, `\"us-west-2a\"`). The provided availability zone must be in the same region as the Databricks deployment. For example, \u201cus-west-2a\u201d is not a valid zone ID if the Databricks deployment resides in the \u201cus-east-1\u201d region. The list of available zones as well as the default value can be found by using the [GET /api/2.0/clusters/list-zones](https://docs.databricks.com/api/workspace/clusters/listzones) call. |\n| `instance_profile_arn` | `STRING` | Nodes for this cluster will only be placed on AWS instances with this instance profile. If omitted, nodes will be placed on instances without an instance profile. The instance profile must have previously been added to the Databricks environment by an account administrator. This feature may only be available to certain customer plans. |\n| `spot_bid_price_percent` | `INT32` | The max price for AWS spot instances, as a percentage of the corresponding instance type\u2019s on-demand price. For example, if this field is set to 50, and the cluster needs a new `i3.xlarge` spot instance, then the max price is half of the price of on-demand `i3.xlarge` instances. Similarly, if this field is set to 200, the max price is twice the price of on-demand `i3.xlarge` instances. If not specified, the default value is 100. When spot instances are requested for this cluster, only spot instances whose max price percentage matches this field will be considered. For safety, we enforce this field to be no more than 10000. |\n| `ebs_volume_type` | [EbsVolumeType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterebsvolumetype) | The type of EBS volumes that will be launched with this cluster. |\n| `ebs_volume_count` | `INT32` | The number of volumes launched for each instance. You can choose up to 10 volumes. This feature is only enabled for supported node types. Legacy node types cannot specify custom EBS volumes. For node types with no instance store, at least one EBS volume needs to be specified; otherwise, cluster creation will fail. These EBS volumes will be mounted at `/ebs0`, `/ebs1`, and etc. Instance store volumes will be mounted at `/local_disk0`, `/local_disk1`, and etc. If EBS volumes are attached, Databricks will configure Spark to use only the EBS volumes for scratch storage because heterogeneously sized scratch devices can lead to inefficient disk utilization. If no EBS volumes are attached, Databricks will configure Spark to use instance store volumes. If EBS volumes are specified, then the Spark configuration `spark.local.dir` will be overridden. |\n| `ebs_volume_size` | `INT32` | The size of each EBS volume (in GiB) launched for each instance. For general purpose SSD, this value must be within the range 100 - 4096. For throughput optimized HDD, this value must be within the range 500 - 4096. Custom EBS volumes cannot be specified for the legacy node types (*memory-optimized* and *compute-optimized*). |\n| `ebs_volume_iops` | `INT32` | The number of IOPS per EBS gp3 volume. This value must be between 3000 and 16000. The value of IOPS and throughput is calculated based on AWS documentation to match the maximum performance of a gp2 volume with the same volume size. For more information, see the [EBS volume limit calculator](https://github.com/awslabs/aws-support-tools/tree/master/EBS/VolumeLimitCalculator). |\n| `ebs_volume_throughput` | `INT32` | The throughput per EBS gp3 volume, in MiB per second. This value must be between 125 and 1000. | \nIf neither `ebs_volume_iops` nor `ebs_volume_throughput` is specified, the values are inferred from the disk size: \n| Disk size | IOPS | Throughput |\n| --- | --- | --- |\n| Greater than 1000 | 3 times the disk size, up to 16000 | 250 |\n| Between 170 and 1000 | 3000 | 250 |\n| Below 170 | 3000 | 125 | \n### [AwsAvailability](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id52) \nThe set of AWS availability types supported when setting up nodes for a cluster. \n| Type | Description |\n| --- | --- |\n| `SPOT` | Use spot instances. |\n| `ON_DEMAND` | Use on-demand instances. |\n| `SPOT_WITH_FALLBACK` | Preferably use spot instances, but fall back to on-demand instances if spot instances cannot be acquired (for example, if AWS spot prices are too high). | \n### [ClusterInstance](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id53) \nIdentifiers for the cluster and Spark context used by a run. These two values together identify an execution context across all time. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `cluster_id` | `STRING` | The canonical identifier for the cluster used by a run. This field is always available for runs on existing clusters. For runs on new clusters, it becomes available once the cluster is created. This value can be used to view logs by browsing to `/#setting/sparkui/$cluster_id/driver-logs`. The logs will continue to be available after the run completes. The response won\u2019t include this field if the identifier is not available yet. |\n| `spark_context_id` | `STRING` | The canonical identifier for the Spark context used by a run. This field will be filled in once the run begins execution. This value can be used to view the Spark UI by browsing to `/#setting/sparkui/$cluster_id/$spark_context_id`. The Spark UI will continue to be available after the run has completed. The response won\u2019t include this field if the identifier is not available yet. | \n### [ClusterLogConf](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id54) \nPath to cluster log. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `dbfs` OR `s3` | [DbfsStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterclusterlogconfdbfsstorageinfo) [S3StorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterclusterinitscriptinfos3storageinfo) | DBFS location of cluster log. Destination must be provided. For example, `{ \"dbfs\" : { \"destination\" : \"dbfs:/home/cluster_log\" } }` S3 location of cluster log. `destination` and either `region` or `warehouse` must be provided. For example, `{ \"s3\": { \"destination\" : \"s3://cluster_log_bucket/prefix\", \"region\" : \"us-west-2\" } }` | \n### [ClusterSpec](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id55) \nImportant \n* When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.\n* When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `existing_cluster_id` OR `new_cluster` | `STRING` OR [NewCluster](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsclusterspecnewcluster) | If existing\\_cluster\\_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability. If new\\_cluster, a description of a cluster that will be created for each run. If specifying a [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobspipelinetask), then this field can be empty. |\n| `libraries` | An array of [Library](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#managedlibrarieslibrary) | An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. | \n### [ClusterTag](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id56) \nCluster tag definition. \n| Type | Description |\n| --- | --- |\n| `STRING` | The key of the tag. The key length must be between 1 and 127 UTF-8 characters, inclusive. For a list of all restrictions, see AWS Tag Restrictions: |\n| `STRING` | The value of the tag. The value length must be less than or equal to 255 UTF-8 characters. For a list of all restrictions, see AWS Tag Restrictions: | \n### [CronSchedule](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id57) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `quartz_cron_expression` | `STRING` | A Cron expression using Quartz syntax that describes the schedule for a job. See [Cron Trigger](http://www.quartz-scheduler.org/documentation/quartz-2.3.0/tutorials/crontrigger.html) for details. This field is required. |\n| `timezone_id` | `STRING` | A Java timezone ID. The schedule for a job will be resolved with respect to this timezone. See [Java TimeZone](https://docs.oracle.com/javase/7/docs/api/java/util/TimeZone.html) for details. This field is required. |\n| `pause_status` | `STRING` | Indicate whether this schedule is paused or not. Either \u201cPAUSED\u201d or \u201cUNPAUSED\u201d. | \n### [DbfsStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id58) \nDBFS storage information. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `destination` | `STRING` | DBFS destination. Example: `dbfs:/my/path` | \n### [EbsVolumeType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id59) \nDatabricks supports gp2 and gp3 EBS volume types. Follow the instructions at [Manage SSD storage](https://docs.databricks.com/admin/clusters/manage-ssd.html) to select gp2 or gp3 for your workspace. \n| Type | Description |\n| --- | --- |\n| `GENERAL_PURPOSE_SSD` | Provision extra storage using AWS EBS volumes. |\n| `THROUGHPUT_OPTIMIZED_HDD` | Provision extra storage using AWS st1 volumes. | \n### [FileStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id60) \nFile storage information. \nNote \nThis location type is only available for clusters set up using [Databricks Container Services](https://docs.databricks.com/compute/custom-containers.html). \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `destination` | `STRING` | File destination. Example: `file:/my/file.sh` | \n### [InitScriptInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id61) \nPath to an init script. \nFor instructions on using init scripts with [Databricks Container Services](https://docs.databricks.com/compute/custom-containers.html), see [Use an init script](https://docs.databricks.com/compute/custom-containers.html#containers-init-script). \nNote \nThe file storage type (field name: `file`) is only available for clusters set up using [Databricks Container Services](https://docs.databricks.com/compute/custom-containers.html). See [FileStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterclusterinitscriptinfofilestorageinfo). \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `workspace` OR `dbfs` (deprecated) OR `S3` | [WorkspaceStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterclusterinitscriptinfoworkspacestorageinfo) [DbfsStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterclusterlogconfdbfsstorageinfo) (deprecated) [S3StorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterclusterinitscriptinfos3storageinfo) | Workspace location of init script. Destination must be provided. For example, `{ \"workspace\" : { \"destination\" : \"/Users/someone@domain.com/init_script.sh\" } }` (Deprecated) DBFS location of init script. Destination must be provided. For example, `{ \"dbfs\" : { \"destination\" : \"dbfs:/home/init_script\" } }` S3 location of init script. Destination and either region or warehouse must be provided. For example, `{ \"s3\": { \"destination\" : \"s3://init_script_bucket/prefix\", \"region\" : \"us-west-2\" } }` | \n### [Job](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id62) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier for this job. |\n| `creator_user_name` | `STRING` | The creator user name. This field won\u2019t be included in the response if the user has already been deleted. |\n| `run_as` | `STRING` | The user name that the job will run as. `run_as` is based on the current job settings, and is set to the creator of the job if job access control is disabled, or the `is_owner` permission if job access control is enabled. |\n| `settings` | [JobSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettings) | Settings for this job and all of its runs. These settings can be updated using the `resetJob` method. |\n| `created_time` | `INT64` | The time at which this job was created in epoch milliseconds (milliseconds since 1/1/1970 UTC). | \n### [JobEmailNotifications](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id63) \nImportant \nThe on\\_start, on\\_success, and on\\_failure fields accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `on_start` | An array of `STRING` | A list of email addresses to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. |\n| `on_success` | An array of `STRING` | A list of email addresses to be notified when a run successfully completes. A run is considered to have completed successfully if it ends with a `TERMINATED` `life_cycle_state` and a `SUCCESSFUL` `result_state`. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. |\n| `on_failure` | An array of `STRING` | A list of email addresses to be notified when a run unsuccessfully completes. A run is considered to have completed unsuccessfully if it ends with an `INTERNAL_ERROR` `life_cycle_state` or a `SKIPPED`, `FAILED`, or `TIMED_OUT` result\\_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent. |\n| `on_duration_warning_threshold_exceeded` | An array of `STRING` | An list of email addresses to be notified when the duration of a run exceeds the threshold specified for the `RUN_DURATION_SECONDS` metric in the `health` field. If no rule for the `RUN_DURATION_SECONDS` metric is specified in the `health` field for the job, notifications are not sent. |\n| `no_alert_for_skipped_runs` | `BOOL` | If true, do not send email to recipients specified in `on_failure` if the run is skipped. | \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `on_start` | An array of [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobwebhook) | An optional list of system destinations to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the `on_start` property. |\n| `on_success` | An array of [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobwebhook) | An optional list of system destinations to be notified when a run completes successfully. A run is considered to have completed successfully if it ends with a `TERMINATED` `life_cycle_state` and a `SUCCESSFUL` `result_state`. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the `on_success` property. |\n| `on_failure` | An array of [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobwebhook) | An optional list of system destinations to be notified when a run completes unsuccessfully. A run is considered to have completed unsuccessfully if it ends with an `INTERNAL_ERROR` `life_cycle_state` or a `SKIPPED`, `FAILED`, or `TIMED_OUT` result\\_state. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the `on_failure` property. |\n| `on_duration_warning_threshold_exceeded` | An array of [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobwebhook) | An optional list of system destinations to be notified when the duration of a run exceeds the threshold specified for the `RUN_DURATION_SECONDS` metric in the `health` field. A maximum of 3 destinations can be specified for the `on_duration_warning_threshold_exceeded` property. | \n### [JobNotificationSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id64) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `no_alert_for_skipped_runs` | `BOOL` | If true, do not send notifications to recipients specified in `on_failure` if the run is skipped. |\n| `no_alert_for_canceled_runs` | `BOOL` | If true, do not send notifications to recipients specified in `on_failure` if the run is canceled. |\n| `alert_on_last_attempt` | `BOOL` | If true, do not send notifications to recipients specified in `on_start` for the retried runs and do not send notifications to recipients specified in `on_failure` until the last retry of the run. | \n### [JobSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id65) \nImportant \n* When you run a job on a new jobs cluster, the job is treated as a Jobs Compute (automated) workload subject to Jobs Compute pricing.\n* When you run a job on an existing all-purpose cluster, it is treated as an All-Purpose Compute (interactive) workload subject to All-Purpose Compute pricing. \nSettings for a job. These settings can be updated using the `resetJob` method. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `existing_cluster_id` OR `new_cluster` | `STRING` OR [NewCluster](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsclusterspecnewcluster) | If existing\\_cluster\\_id, the ID of an existing cluster that will be used for all runs of this job. When running jobs on an existing cluster, you may need to manually restart the cluster if it stops responding. We suggest running jobs on new clusters for greater reliability. If new\\_cluster, a description of a cluster that will be created for each run. If specifying a [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobspipelinetask), then this field can be empty. |\n| `notebook_task` OR `spark_jar_task` OR `spark_python_task` OR `spark_submit_task` OR `pipeline_task` OR `run_job_task` | [NotebookTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsnotebooktask) OR [SparkJarTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparkjartask) OR [SparkPythonTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparkpythontask) OR [SparkSubmitTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparksubmittask) OR [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobspipelinetask) OR [RunJobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunjobtask) | If notebook\\_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark\\_jar\\_task. If spark\\_jar\\_task, indicates that this job should run a JAR. If spark\\_python\\_task, indicates that this job should run a Python file. If spark\\_submit\\_task, indicates that this job should be launched by the spark submit script. If pipeline\\_task, indicates that this job should run a Delta Live Tables pipeline. If run\\_job\\_task, indicates that this job should run another job. |\n| `name` | `STRING` | An optional name for the job. The default value is `Untitled`. |\n| `libraries` | An array of [Library](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#managedlibrarieslibrary) | An optional list of libraries to be installed on the cluster that will execute the job. The default value is an empty list. |\n| `email_notifications` | [JobEmailNotifications](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobemailnotifications) | An optional set of email addresses that will be notified when runs of this job begin or complete as well as when this job is deleted. The default behavior is to not send any emails. |\n| `webhook_notifications` | [WebhookNotifications](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobsystemnotifications) | An optional set of system destinations to notify when runs of this job begin, complete, or fail. |\n| `notification_settings` | [JobNotificationSettings](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobnotificationsettings) | Optional notification settings that are used when sending notifications to each of the `email_notifications` and `webhook_notifications` for this job. |\n| `timeout_seconds` | `INT32` | An optional timeout applied to each run of this job. The default behavior is to have no timeout. |\n| `max_retries` | `INT32` | An optional maximum number of times to retry an unsuccessful run. A run is considered to be unsuccessful if it completes with the `FAILED` result\\_state or `INTERNAL_ERROR` `life_cycle_state`. The value -1 means to retry indefinitely and the value 0 means to never retry. The default behavior is to never retry. |\n| `min_retry_interval_millis` | `INT32` | An optional minimal interval in milliseconds between attempts. The default behavior is that unsuccessful runs are immediately retried. |\n| `retry_on_timeout` | `BOOL` | An optional policy to specify whether to retry a job when it times out. The default behavior is to not retry on timeout. |\n| `schedule` | [CronSchedule](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobscronschedule) | An optional periodic schedule for this job. The default behavior is that the job will only run when triggered by clicking \u201cRun Now\u201d in the Jobs UI or sending an API request to `runNow`. |\n| `max_concurrent_runs` | `INT32` | An optional maximum allowed number of concurrent runs of the job. Set this value if you want to be able to execute multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs which differ by their input parameters. This setting affects only new runs. For example, suppose the job\u2019s concurrency is 4 and there are 4 concurrent active runs. Then setting the concurrency to 3 won\u2019t kill any of the active runs. However, from then on, new runs will be skipped unless there are fewer than 3 active runs. This value cannot exceed 1000. Setting this value to 0 causes all new runs to be skipped. The default behavior is to allow only 1 concurrent run. |\n| `health` | [JobsHealthRules](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobshealthrules) | An optional set of health rules defined for the job. | \n### [JobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id66) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `notebook_task` OR `spark_jar_task` OR `spark_python_task` OR `spark_submit_task` OR `pipeline_task` OR `run_job_task` | [NotebookTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsnotebooktask) OR [SparkJarTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparkjartask) OR [SparkPythonTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparkpythontask) OR [SparkSubmitTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobssparksubmittask) OR [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobspipelinetask) OR [RunJobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunjobtask) | If notebook\\_task, indicates that this job should run a notebook. This field may not be specified in conjunction with spark\\_jar\\_task. If spark\\_jar\\_task, indicates that this job should run a JAR. If spark\\_python\\_task, indicates that this job should run a Python file. If spark\\_submit\\_task, indicates that this job should be launched by the spark submit script. If pipeline\\_task, indicates that this job should run a Delta Live Tables pipeline. If run\\_job\\_task, indicates that this job should run another job. | \n### [JobsHealthRule](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id67) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `metric` | `STRING` | Specifies the health metric that is being evaluated for a particular health rule. Valid values are `RUN_DURATION_SECONDS`. |\n| `operator` | `STRING` | Specifies the operator used to compare the health metric value with the specified threshold. Valid values are `GREATER_THAN`. |\n| `value` | `INT32` | Specifies the threshold value that the health metric should meet to comply with the health rule. | \n### [JobsHealthRules](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id68) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `rules` | An array of [JobsHealthRule](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobshealthrule) | An optional set of health rules that can be defined for a job. | \n### [Library](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id69) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `jar` OR `egg` OR `whl` OR `pypi` OR `maven` OR `cran` | `STRING` OR `STRING` OR `STRING` OR [PythonPyPiLibrary](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#managedlibrariespythonpypilibrary) OR [MavenLibrary](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#managedlibrariesmavenlibrary) OR [RCranLibrary](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#managedlibrariesrcranlibrary) | If jar, URI of the JAR to be installed. DBFS and S3 URIs are supported. For example: `{ \"jar\": \"dbfs:/mnt/databricks/library.jar\" }` or `{ \"jar\": \"s3://my-bucket/library.jar\" }`. If S3 is used, make sure the cluster has read access on the library. You may need to launch the cluster with an instance profile to access the S3 URI. If egg, URI of the egg to be installed. DBFS and S3 URIs are supported. For example: `{ \"egg\": \"dbfs:/my/egg\" }` or `{ \"egg\": \"s3://my-bucket/egg\" }`. If S3 is used, make sure the cluster has read access on the library. You may need to launch the cluster with an instance profile to access the S3 URI. If whl, URI of the `wheel` or zipped `wheels` to be installed. DBFS and S3 URIs are supported. For example: `{ \"whl\": \"dbfs:/my/whl\" }` or `{ \"whl\": \"s3://my-bucket/whl\" }`. If S3 is used, make sure the cluster has read access on the library. You may need to launch the cluster with an instance profile to access the S3 URI. Also the `wheel` file name needs to use the [correct convention](https://www.python.org/dev/peps/pep-0427/#file-format). If zipped `wheels` are to be installed, the file name suffix should be `.wheelhouse.zip`. If pypi, specification of a PyPI library to be installed. Specifying the `repo` field is optional and if not specified, the default pip index is used. For example: `{ \"package\": \"simplejson\", \"repo\": \"https://my-repo.com\" }` If maven, specification of a Maven library to be installed. For example: `{ \"coordinates\": \"org.jsoup:jsoup:1.7.2\" }` If cran, specification of a CRAN library to be installed. | \n### [MavenLibrary](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id70) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `coordinates` | `STRING` | Gradle-style Maven coordinates. For example: `org.jsoup:jsoup:1.7.2`. This field is required. |\n| `repo` | `STRING` | Maven repo to install the Maven package from. If omitted, both Maven Central Repository and Spark Packages are searched. |\n| `exclusions` | An array of `STRING` | List of dependences to exclude. For example: `[\"slf4j:slf4j\", \"*:hadoop-client\"]`. Maven dependency exclusions: . | \n### [NewCluster](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id71) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `num_workers` OR `autoscale` | `INT32` OR [AutoScale](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterautoscale) | If num\\_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num\\_workers executors for a total of num\\_workers + 1 Spark nodes. When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual current number of workers. For example, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in `spark_info` will gradually increase from 5 to 10 as the new nodes are provisioned. If autoscale, the required parameters to automatically scale clusters up and down based on load. |\n| `spark_version` | `STRING` | The Spark version of the cluster. A list of available Spark versions can be retrieved by using the [GET 2.0/clusters/spark-versions](https://docs.databricks.com/api/workspace/clusters/sparkversions) call. This field is required. |\n| `spark_conf` | [SparkConfPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clustersparkconfpair) | An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via `spark.driver.extraJavaOptions` and `spark.executor.extraJavaOptions` respectively. Example Spark confs: `{\"spark.speculation\": true, \"spark.streaming.ui.retainedBatches\": 5}` or `{\"spark.driver.extraJavaOptions\": \"-verbose:gc -XX:+PrintGCDetails\"}` |\n| `aws_attributes` | [AwsAttributes](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterawsattributes) | Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used. |\n| `node_type_id` | `STRING` | This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads A list of available node types can be retrieved by using the [GET 2.0/clusters/list-node-types](https://docs.databricks.com/api/workspace/clusters/listnodetypes) call. This field, the `instance_pool_id` field, or a cluster policy that specifies a node type ID or instance pool ID, is required. |\n| `driver_node_type_id` | `STRING` | The node type of the Spark driver. This field is optional; if unset, the driver node type will be set as the same value as `node_type_id` defined above. |\n| `ssh_public_keys` | An array of `STRING` | SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name `ubuntu` on port `2200`. Up to 10 keys can be specified. |\n| `custom_tags` | [ClusterTag](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterclustertag) | An object containing a set of tags for cluster resources. Databricks tags all cluster resources (such as AWS instances and EBS volumes) with these tags in addition to default\\_tags. **Note**:* Tags are not supported on legacy node types such as compute-optimized and memory-optimized * Databricks allows at most 45 custom tags |\n| `cluster_log_conf` | [ClusterLogConf](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterclusterlogconf) | The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every `5 mins`. The destination of driver logs is `//driver`, while the destination of executor logs is `//executor`. |\n| `init_scripts` | An array of [InitScriptInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clusterclusterinitscriptinfo) | The configuration for storing init scripts. Any number of scripts can be specified. The scripts are executed sequentially in the order provided. If `cluster_log_conf` is specified, init script logs are sent to `//init_scripts`. |\n| `spark_env_vars` | [SparkEnvPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#clustersparkenvpair) | An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pair of the form (X,Y) are exported as is (i.e., `export X='Y'`) while launching the driver and workers. To specify an additional set of `SPARK_DAEMON_JAVA_OPTS`, we recommend appending them to `$SPARK_DAEMON_JAVA_OPTS` as shown in the following example. This ensures that all default databricks managed environmental variables are included as well. Example Spark environment variables: `{\"SPARK_WORKER_MEMORY\": \"28000m\", \"SPARK_LOCAL_DIRS\": \"/local_disk0\"}` or `{\"SPARK_DAEMON_JAVA_OPTS\": \"$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true\"}` |\n| `enable_elastic_disk` | `BOOL` | Autoscaling Local Storage: when enabled, this cluster dynamically acquires additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to [Enable autoscaling local storage](https://docs.databricks.com/compute/configure.html#autoscaling-local-storage) for details. |\n| `driver_instance_pool_id` | `STRING` | The optional ID of the instance pool to use for the driver node. You must also specify `instance_pool_id`. Refer to the [Instance Pools API](https://docs.databricks.com/api/workspace/instancepools) for details. |\n| `instance_pool_id` | `STRING` | The optional ID of the instance pool to use for cluster nodes. If `driver_instance_pool_id` is present, `instance_pool_id` is used for worker nodes only. Otherwise, it is used for both the driver node and worker nodes. Refer to the [Instance Pools API](https://docs.databricks.com/api/workspace/instancepools) for details. | \n### [NotebookOutput](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id72) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `result` | `STRING` | The value passed to [dbutils.notebook.exit()](https://docs.databricks.com/notebooks/notebook-workflows.html#notebook-workflows-exit). Databricks restricts this API to return the first 1 MB of the value. For a larger result, your job can store the results in a cloud storage service. This field will be absent if `dbutils.notebook.exit()` was never called. |\n| `truncated` | `BOOLEAN` | Whether or not the result was truncated. | \n### [NotebookTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id73) \nAll the output cells are subject to the size of 8MB. If the output of a cell has a larger size, the rest of the run will be cancelled and the run will be marked as failed. In that case, some of the content output from other cells may also be missing. \nIf you need help finding the cell that is beyond the limit, run the notebook against an all-purpose cluster and use this [notebook autosave technique](https://kb.databricks.com/notebooks/notebook-autosave.html). \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `notebook_path` | `STRING` | The absolute path of the notebook to be run in the Databricks workspace. This path must begin with a slash. This field is required. |\n| `revision_timestamp` | `LONG` | The timestamp of the revision of the notebook. |\n| `base_parameters` | A map of [ParamPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsparampair) | Base parameters to be used for each run of this job. If the run is initiated by a call to `run-now` with parameters specified, the two parameters maps will be merged. If the same key is specified in `base_parameters` and in `run-now`, the value from `run-now` will be used. Use [Pass context about job runs into job tasks](https://docs.databricks.com/workflows/jobs/parameter-value-references.html) to set parameters containing information about job runs. If the notebook takes a parameter that is not specified in the job\u2019s `base_parameters` or the `run-now` override parameters, the default value from the notebook will be used. Retrieve these parameters in a notebook using [dbutils.widgets.get](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets). | \n### [ParamPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id74) \nName-based parameters for jobs running notebook tasks. \nImportant \nThe fields in this data structure accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis. \n| Type | Description |\n| --- | --- |\n| `STRING` | Parameter name. Pass to [dbutils.widgets.get](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets) to retrieve the value. |\n| `STRING` | Parameter value. | \n### [PipelineTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id75) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `pipeline_id` | `STRING` | The full name of the Delta Live Tables pipeline task to execute. | \n### [PythonPyPiLibrary](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id76) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `package` | `STRING` | The name of the PyPI package to install. An optional exact version specification is also supported. Examples: `simplejson` and `simplejson==3.8.0`. This field is required. |\n| `repo` | `STRING` | The repository where the package can be found. If not specified, the default pip index is used. | \n### [RCranLibrary](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id77) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `package` | `STRING` | The name of the CRAN package to install. This field is required. |\n| `repo` | `STRING` | The repository where the package can be found. If not specified, the default CRAN repo is used. | \n### [Run](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id78) \nAll the information about a run except for its output. The output can be retrieved separately\nwith the `getRunOutput` method. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT64` | The canonical identifier of the job that contains this run. |\n| `run_id` | `INT64` | The canonical identifier of the run. This ID is unique across all runs of all jobs. |\n| `creator_user_name` | `STRING` | The creator user name. This field won\u2019t be included in the response if the user has already been deleted. |\n| `number_in_job` | `INT64` | The sequence number of this run among all runs of the job. This value starts at 1. |\n| `original_attempt_run_id` | `INT64` | If this run is a retry of a prior run attempt, this field contains the run\\_id of the original attempt; otherwise, it is the same as the run\\_id. |\n| `state` | [RunState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunstate) | The result and lifecycle states of the run. |\n| `schedule` | [CronSchedule](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobscronschedule) | The cron schedule that triggered this run if it was triggered by the periodic scheduler. |\n| `task` | [JobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobtask) | The task performed by the run, if any. |\n| `cluster_spec` | [ClusterSpec](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsclusterspec) | A snapshot of the job\u2019s cluster specification when this run was created. |\n| `cluster_instance` | [ClusterInstance](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsclusterinstance) | The cluster used for this run. If the run is specified to use a new cluster, this field will be set once the Jobs service has requested a cluster for the run. |\n| `overriding_parameters` | [RunParameters](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunparameters) | The parameters used for this run. |\n| `start_time` | `INT64` | The time at which this run was started in epoch milliseconds (milliseconds since 1/1/1970 UTC). This may not be the time when the job task starts executing, for example, if the job is scheduled to run on a new cluster, this is the time the cluster creation call is issued. |\n| `setup_duration` | `INT64` | The time it took to set up the cluster in milliseconds. For runs that run on new clusters this is the cluster creation time, for runs that run on existing clusters this time should be very short. |\n| `execution_duration` | `INT64` | The time in milliseconds it took to execute the commands in the JAR or notebook until they completed, failed, timed out, were cancelled, or encountered an unexpected error. |\n| `cleanup_duration` | `INT64` | The time in milliseconds it took to terminate the cluster and clean up any associated artifacts. The total duration of the run is the sum of the setup\\_duration, the execution\\_duration, and the cleanup\\_duration. |\n| `end_time` | `INT64` | The time at which this run ended in epoch milliseconds (milliseconds since 1/1/1970 UTC). This field will be set to 0 if the job is still running. |\n| `trigger` | [TriggerType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobstriggertype) | The type of trigger that fired this run. |\n| `run_name` | `STRING` | An optional name for the run. The default value is `Untitled`. The maximum allowed length is 4096 bytes in UTF-8 encoding. |\n| `run_page_url` | `STRING` | The URL to the detail page of the run. |\n| `run_type` | `STRING` | The type of the run.* `JOB_RUN` - Normal job run. A run created with [Run now](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsservicerunnow). * `WORKFLOW_RUN` - Workflow run. A run created with [dbutils.notebook.run](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-workflow). * `SUBMIT_RUN` - Submit run. A run created with [Run now](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsservicerunnow). |\n| `attempt_number` | `INT32` | The sequence number of this run attempt for a triggered job run. The initial attempt of a run has an attempt\\_number of 0. If the initial run attempt fails, and the job has a retry policy (`max_retries` > 0), subsequent runs are created with an `original_attempt_run_id` of the original attempt\u2019s ID and an incrementing `attempt_number`. Runs are retried only until they succeed, and the maximum `attempt_number` is the same as the `max_retries` value for the job. | \n### [RunJobTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id79) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `job_id` | `INT32` | Unique identifier of the job to run. This field is required. | \n### [RunLifeCycleState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id80) \nThe life cycle state of a run. Allowed state transitions are: \n* `PENDING` -> `RUNNING` -> `TERMINATING` -> `TERMINATED`\n* `PENDING` -> `SKIPPED`\n* `PENDING` -> `INTERNAL_ERROR`\n* `RUNNING` -> `INTERNAL_ERROR`\n* `TERMINATING` -> `INTERNAL_ERROR` \n| State | Description |\n| --- | --- |\n| `PENDING` | The run has been triggered. If there is not already an active run of the same job, the cluster and execution context are being prepared. If there is already an active run of the same job, the run will immediately transition into the `SKIPPED` state without preparing any resources. |\n| `RUNNING` | The task of this run is being executed. |\n| `TERMINATING` | The task of this run has completed, and the cluster and execution context are being cleaned up. |\n| `TERMINATED` | The task of this run has completed, and the cluster and execution context have been cleaned up. This state is terminal. |\n| `SKIPPED` | This run was aborted because a previous run of the same job was already active. This state is terminal. |\n| `INTERNAL_ERROR` | An exceptional state that indicates a failure in the Jobs service, such as network failure over a long period. If a run on a new cluster ends in the `INTERNAL_ERROR` state, the Jobs service terminates the cluster as soon as possible. This state is terminal. | \n### [RunParameters](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id81) \nParameters for this run. Only one of jar\\_params, `python_params`, or notebook\\_params\nshould be specified in the `run-now` request, depending on the type of job task.\nJobs with Spark JAR task or Python task take a list of position-based parameters, and jobs\nwith notebook tasks take a key value map. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `jar_params` | An array of `STRING` | A list of parameters for jobs with Spark JAR tasks, e.g. `\"jar_params\": [\"john doe\", \"35\"]`. The parameters will be used to invoke the main function of the main class specified in the Spark JAR task. If not specified upon `run-now`, it will default to an empty list. jar\\_params cannot be specified in conjunction with notebook\\_params. The JSON representation of this field (i.e. `{\"jar_params\":[\"john doe\",\"35\"]}`) cannot exceed 10,000 bytes. Use [Pass context about job runs into job tasks](https://docs.databricks.com/workflows/jobs/parameter-value-references.html) to set parameters containing information about job runs. |\n| `notebook_params` | A map of [ParamPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsparampair) | A map from keys to values for jobs with notebook task, e.g. `\"notebook_params\": {\"name\": \"john doe\", \"age\": \"35\"}`. The map is passed to the notebook and is accessible through the [dbutils.widgets.get](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets) function. If not specified upon `run-now`, the triggered run uses the job\u2019s base parameters. notebook\\_params cannot be specified in conjunction with jar\\_params. Use [Pass context about job runs into job tasks](https://docs.databricks.com/workflows/jobs/parameter-value-references.html) to set parameters containing information about job runs. The JSON representation of this field (i.e. `{\"notebook_params\":{\"name\":\"john doe\",\"age\":\"35\"}}`) cannot exceed 10,000 bytes. |\n| `python_params` | An array of `STRING` | A list of parameters for jobs with Python tasks, e.g. `\"python_params\": [\"john doe\", \"35\"]`. The parameters are passed to Python file as command-line parameters. If specified upon `run-now`, it would overwrite the parameters specified in job setting. The JSON representation of this field (i.e. `{\"python_params\":[\"john doe\",\"35\"]}`) cannot exceed 10,000 bytes. Use [Pass context about job runs into job tasks](https://docs.databricks.com/workflows/jobs/parameter-value-references.html) to set parameters containing information about job runs. Important These parameters accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis. |\n| `spark_submit_params` | An array of `STRING` | A list of parameters for jobs with spark submit task, e.g. `\"spark_submit_params\": [\"--class\", \"org.apache.spark.examples.SparkPi\"]`. The parameters are passed to spark-submit script as command-line parameters. If specified upon `run-now`, it would overwrite the parameters specified in job setting. The JSON representation of this field (i.e. `{\"python_params\":[\"john doe\",\"35\"]}`) cannot exceed 10,000 bytes. Use [Pass context about job runs into job tasks](https://docs.databricks.com/workflows/jobs/parameter-value-references.html) to set parameters containing information about job runs. Important These parameters accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis. | \n### [RunResultState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id82) \nThe result state of the run. \n* If `life_cycle_state` = `TERMINATED`: if the run had a task, the result is guaranteed to be\navailable, and it indicates the result of the task.\n* If `life_cycle_state` = `PENDING`, `RUNNING`, or `SKIPPED`, the result state is not available.\n* If `life_cycle_state` = `TERMINATING` or lifecyclestate = `INTERNAL_ERROR`: the result state\nis available if the run had a task and managed to start it. \nOnce available, the result state never changes. \n| State | Description |\n| --- | --- |\n| `SUCCESS` | The task completed successfully. |\n| `FAILED` | The task completed with an error. |\n| `TIMEDOUT` | The run was stopped after reaching the timeout. |\n| `CANCELED` | The run was canceled at user request. | \n### [RunState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id83) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `life_cycle_state` | [RunLifeCycleState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunlifecyclestate) | A description of a run\u2019s current location in the run lifecycle. This field is always available in the response. |\n| `result_state` | [RunResultState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsrunresultstate) | The result state of a run. If it is not available, the response won\u2019t include this field. See [RunResultState](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#runresultstate) for details about the availability of result\\_state. |\n| `user_cancelled_or_timedout` | `BOOLEAN` | Whether a run was canceled manually by a user or by the scheduler because the run timed out. |\n| `state_message` | `STRING` | A descriptive message for the current state. This field is unstructured, and its exact format is subject to change. | \n### [S3StorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id84) \nS3 storage information. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `destination` | `STRING` | S3 destination. For example: `s3://my-bucket/some-prefix` You must configure the cluster with an instance profile and the instance profile must have write access to the destination. You *cannot* use AWS keys. |\n| `region` | `STRING` | S3 region. For example: `us-west-2`. Either region or warehouse must be set. If both are set, warehouse is used. |\n| `warehouse` | `STRING` | S3 warehouse. For example: `https://s3-us-west-2.amazonaws.com`. Either region or warehouse must be set. If both are set, warehouse is used. |\n| `enable_encryption` | `BOOL` | (Optional)Enable server side encryption, `false` by default. |\n| `encryption_type` | `STRING` | (Optional) The encryption type, it could be `sse-s3` or `sse-kms`. It is used only when encryption is enabled and the default type is `sse-s3`. |\n| `kms_key` | `STRING` | (Optional) KMS key used if encryption is enabled and encryption type is set to `sse-kms`. |\n| `canned_acl` | `STRING` | (Optional) Set canned access control list. For example: `bucket-owner-full-control`. If canned\\_acl is set, the cluster instance profile must have `s3:PutObjectAcl` permission on the destination bucket and prefix. The full list of possible canned ACLs can be found at . By default only the object owner gets full control. If you are using cross account role for writing data, you may want to set `bucket-owner-full-control` to make bucket owner able to read the logs. | \n### [SparkConfPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id85) \nSpark configuration key-value pairs. \n| Type | Description |\n| --- | --- |\n| `STRING` | A configuration property name. |\n| `STRING` | The configuration property value. | \n### [SparkEnvPair](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id86) \nSpark environment variable key-value pairs. \nImportant \nWhen specifying environment variables in a job cluster, the fields in this data structure accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis. \n| Type | Description |\n| --- | --- |\n| `STRING` | An environment variable name. |\n| `STRING` | The environment variable value. | \n### [SparkJarTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id87) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `jar_uri` | `STRING` | Deprecated since 04/2016. Provide a `jar` through the `libraries` field instead. For an example, see [Create](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#create). |\n| `main_class_name` | `STRING` | The full name of the class containing the main method to be executed. This class must be contained in a JAR provided as a library. The code should use `SparkContext.getOrCreate` to obtain a Spark context; otherwise, runs of the job will fail. |\n| `parameters` | An array of `STRING` | Parameters passed to the main method. Use [Pass context about job runs into job tasks](https://docs.databricks.com/workflows/jobs/parameter-value-references.html) to set parameters containing information about job runs. | \n### [SparkPythonTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id88) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `python_file` | `STRING` | The URI of the Python file to be executed. DBFS and S3 paths are supported. This field is required. |\n| `parameters` | An array of `STRING` | Command line parameters passed to the Python file. Use [Pass context about job runs into job tasks](https://docs.databricks.com/workflows/jobs/parameter-value-references.html) to set parameters containing information about job runs. | \n### [SparkSubmitTask](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id89) \nImportant \n* You can invoke Spark submit tasks only on new clusters.\n* In the new\\_cluster specification, `libraries` and `spark_conf` are not supported. Instead, use `--jars` and `--py-files` to add Java and Python libraries and `--conf` to set the Spark configuration.\n* `master`, `deploy-mode`, and `executor-cores` are automatically configured by Databricks;\nyou *cannot* specify them in parameters.\n* By default, the Spark submit job uses all available memory (excluding reserved memory for\nDatabricks services). You can set `--driver-memory`, and `--executor-memory` to a\nsmaller value to leave some room for off-heap usage.\n* The `--jars`, `--py-files`, `--files` arguments support DBFS and S3 paths. \nFor example, assuming the JAR is uploaded to DBFS, you can run `SparkPi` by setting the following parameters. \n```\n{\n\"parameters\": [\n\"--class\",\n\"org.apache.spark.examples.SparkPi\",\n\"dbfs:/path/to/examples.jar\",\n\"10\"\n]\n}\n\n``` \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `parameters` | An array of `STRING` | Command-line parameters passed to spark submit. Use [Pass context about job runs into job tasks](https://docs.databricks.com/workflows/jobs/parameter-value-references.html) to set parameters containing information about job runs. | \n### [TriggerType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id90) \nThese are the type of triggers that can fire a run. \n| Type | Description |\n| --- | --- |\n| `PERIODIC` | Schedules that periodically trigger runs, such as a cron scheduler. |\n| `ONE_TIME` | One time triggers that fire a single run. This occurs you triggered a single run on demand through the UI or the API. |\n| `RETRY` | Indicates a run that is triggered as a retry of a previously failed run. This occurs when you request to re-run the job in case of failures. | \n### [ViewItem](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id91) \nThe exported content is in HTML format. For example, if the view to export is dashboards, one HTML string is returned for every dashboard. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `content` | `STRING` | Content of the view. |\n| `name` | `STRING` | Name of the view item. In the case of code view, the notebook\u2019s name. In the case of dashboard view, the dashboard\u2019s name. |\n| `type` | [ViewType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsviewtype) | Type of the view item. | \n### [ViewType](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id92) \n| Type | Description |\n| --- | --- |\n| `NOTEBOOK` | Notebook view item. |\n| `DASHBOARD` | Dashboard view item. | \n### [ViewsToExport](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id93) \nView to export: either code, all dashboards, or all. \n| Type | Description |\n| --- | --- |\n| `CODE` | Code view of the notebook. |\n| `DASHBOARDS` | All dashboard views of the notebook. |\n| `ALL` | All views of the notebook. | \n### [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id94) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `id` | `STRING` | Identifier referencing a system notification destination. This field is required. | \n### [WebhookNotifications](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id95) \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `on_start` | An array of [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobwebhook) | An optional list of system destinations to be notified when a run begins. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the `on_start` property. |\n| `on_success` | An array of [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobwebhook) | An optional list of system destinations to be notified when a run completes successfully. A run is considered to have completed successfully if it ends with a `TERMINATED` `life_cycle_state` and a `SUCCESSFUL` `result_state`. If not specified on job creation, reset, or update, the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the `on_success` property. |\n| `on_failure` | An array of [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobwebhook) | An optional list of system destinations to be notified when a run completes unsuccessfully. A run is considered to have completed unsuccessfully if it ends with an `INTERNAL_ERROR` `life_cycle_state` or a `SKIPPED`, `FAILED`, or `TIMED_OUT` `result_state`. If this is not specified on job creation, reset, or update the list is empty, and notifications are not sent. A maximum of 3 destinations can be specified for the `on_failure` property. |\n| `on_duration_warning_threshold_exceeded` | An array of [Webhook](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsettingsjobwebhook) | An optional list of system destinations to be notified when the duration of a run exceeds the threshold specified for the `RUN_DURATION_SECONDS` metric in the `health` field. A maximum of 3 destinations can be specified for the `on_duration_warning_threshold_exceeded` property. | \n### [WorkspaceStorageInfo](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#id96) \nWorkspace storage information. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `destination` | `STRING` | File destination. Example: `/Users/someone@domain.com/init_script.sh` |\n\n", "chunk_id": "e1c7dde47df3c7d87f5b5387beb55298", "url": "https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Information schema\n##### VOLUMES\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL [check marked yes](https://docs.databricks.com/_static/images/icons/check.png) Databricks Runtime 13.3 LTS and above ![check marked yes](https://docs.databricks.com/_images/check.png) Unity Catalog only \nINFORMATION\\_SCHEMA.VOLUMES contains the object level metadata for [volumes](https://docs.databricks.com/sql/language-manual/sql-ref-volumes.html) within the local catalog or all catalogs if owned by the `SYSTEM` catalog. \nThe rows returned are limited to the volumes the user is privileged to interact with. \nThis is an extension to the SQL Standard Information Schema.\n\n", "chunk_id": "063703937a115bde0e85f98b0b94d455", "url": "https://docs.databricks.com/sql/language-manual/information-schema/volumes.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Information schema\n##### VOLUMES\n###### Definition\n\nThe `VOLUMES` relation contains the following columns: \n| Name | Data type | Nullable | Description |\n| --- | --- | --- | --- |\n| `VOLUME_CATALOG` | `STRING` | No | Catalog that contains the volume. |\n| `VOLUME_SCHEMA` | `STRING` | No | Schema that contains the volume. |\n| `VOLUME_NAME` | `STRING` | No | Name of the volume. |\n| `VOLUME_TYPE` | `STRING` | No | One of `'MANAGED'`, `'EXTERNAL'`. |\n| `VOLUME_OWNER` | `STRING` | No | User or group (principal) currently owning the volume. |\n| `COMMENT` | `STRING` | Yes | An optional comment that describes the volume. |\n| `CREATED` | `TIMESTAMP` | No | Timestamp when the volume was created. |\n| `CREATED_BY` | `STRING` | No | [Principal](https://docs.databricks.com/sql/language-manual/sql-ref-principal.html) which created the volume. |\n| `LAST_ALTERED` | `TIMESTAMP` | No | Timestamp when the volume definition was last altered in any way. |\n| `LAST_ALTERED_BY` | `STRING` | No | [Principal](https://docs.databricks.com/sql/language-manual/sql-ref-principal.html) which last altered the volume. |\n| `STORAGE_LOCATION` | `STRING` | No | The storage location where the volume is created. |\n\n", "chunk_id": "df3e79f70a17c0ea78b5f7e6fe5f525f", "url": "https://docs.databricks.com/sql/language-manual/information-schema/volumes.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Information schema\n##### VOLUMES\n###### Constraints\n\nThe following constraints apply to the `VOLUMES` relation: \n| Class | Name | Column List | Description |\n| --- | --- | --- | --- |\n| Primary key | `VOLUMES_PK` | `VOLUME_CATALOG`, `VOLUME_SCHEMA`, `VOLUME_NAME` | Unique identifier for the volume. |\n| Foreign key | `VOLUME_SCHEMATA_FK` | `VOLUME_CATALOG`, `VOLUME_SCHEMA` | References [SCHEMATA](https://docs.databricks.com/sql/language-manual/information-schema/schemata.html). |\n\n##### VOLUMES\n###### Examples\n\n```\n> SELECT table_owner\nFROM information_schema.volumes\nWHERE volume_schema = 'my_schema'\nAND volume_name = 'my_volume';\n\n```\n\n##### VOLUMES\n###### Related\n\n* [DESCRIBE VOLUME](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-describe-volume.html)\n* [Information schema](https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html)\n* [INFORMATION\\_SCHEMA.SCHEMATA](https://docs.databricks.com/sql/language-manual/information-schema/schemata.html)\n* [INFORMATION\\_SCHEMA.VOLUME\\_PRIVILEGES](https://docs.databricks.com/sql/language-manual/information-schema/volume_privileges.html)\n* [SHOW VOLUMES](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-show-volumes.html)\n\n", "chunk_id": "b849859ec62eecbcff789feb1abda8f2", "url": "https://docs.databricks.com/sql/language-manual/information-schema/volumes.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `<` (lt sign) operator\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns `true` if `expr1` is less than `expr2`, or `false` otherwise.\n\n####### `<` (lt sign) operator\n######## Syntax\n\n```\nexpr1 < expr2\n\n```\n\n####### `<` (lt sign) operator\n######## Arguments\n\n* `expr1`: An expression of any comparable type.\n* `expr2`: An expression that shares a [least common type](https://docs.databricks.com/sql/language-manual/sql-ref-datatype-rules.html#least-common-type-resolution) with `expr1`.\n\n####### `<` (lt sign) operator\n######## Returns\n\nA BOOLEAN.\n\n####### `<` (lt sign) operator\n######## Examples\n\n```\n> SELECT 1 < 2;\ntrue\n> SELECT 1.1 < '1';\nfalse\n> SELECT to_date('2009-07-30 04:17:52') < to_date('2009-07-30 04:17:52');\nfalse\n> SELECT to_date('2009-07-30 04:17:52') < to_date('2009-08-01 04:17:52');\ntrue\n> SELECT 1 < NULL;\nNULL\n\n```\n\n", "chunk_id": "9e42833c69cb83e7e001e46b9d102a44", "url": "https://docs.databricks.com/sql/language-manual/functions/ltsign.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `<` (lt sign) operator\n######## Related\n\n* [!= (bangeq sign) operator](https://docs.databricks.com/sql/language-manual/functions/bangeqsign.html)\n* [<= (lt eq sign) operator](https://docs.databricks.com/sql/language-manual/functions/lteqsign.html)\n* [> (gt sign) operator](https://docs.databricks.com/sql/language-manual/functions/gtsign.html)\n* [>= (gt eq sign) operator](https://docs.databricks.com/sql/language-manual/functions/gteqsign.html)\n* [<=> (lt eq gt sign) operator](https://docs.databricks.com/sql/language-manual/functions/lteqgtsign.html)\n* [= (eq sign) operator](https://docs.databricks.com/sql/language-manual/functions/eqsign.html)\n* [<> (lt gt sign) operator](https://docs.databricks.com/sql/language-manual/functions/ltgtsign.html)\n* [SQL data type rules](https://docs.databricks.com/sql/language-manual/sql-ref-datatype-rules.html)\n\n", "chunk_id": "732e1c42631674f4f65007301478a21c", "url": "https://docs.databricks.com/sql/language-manual/functions/ltsign.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `rand` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns a random value between 0 and 1. This function is a synonym for [random function](https://docs.databricks.com/sql/language-manual/functions/random.html).\n\n####### `rand` function\n######## Syntax\n\n```\nrand( [seed] )\n\n```\n\n####### `rand` function\n######## Arguments\n\n* `seed`: An optional `INTEGER` literal.\n\n####### `rand` function\n######## Returns\n\nA `DOUBLE`. \nThe function generates pseudo random results with independent and identically distributed (i.i.d.) uniformly distributed values in [0, 1). \nThis function is non-deterministic.\n\n####### `rand` function\n######## Examples\n\n```\n> SELECT rand();\n0.9629742951434543\n> SELECT rand(0);\n0.8446490682263027\n> SELECT rand(null);\n0.8446490682263027\n\n```\n\n####### `rand` function\n######## Related functions\n\n* [randn function](https://docs.databricks.com/sql/language-manual/functions/randn.html)\n* [random function](https://docs.databricks.com/sql/language-manual/functions/random.html)\n\n", "chunk_id": "a7bf27c0a4580a3a3d0659cee5115985", "url": "https://docs.databricks.com/sql/language-manual/functions/rand.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n\nImportant \nThis article\u2019s content has been retired and might not be updated. See [Delta Live Tables](https://docs.databricks.com/api/workspace/pipelines) in the Databricks REST API Reference. \nThe Delta Live Tables API allows you to create, edit, delete, start, and view details about pipelines. \nImportant \nTo access Databricks REST APIs, you must [authenticate](https://docs.databricks.com/dev-tools/auth/index.html).\n\n", "chunk_id": "d025c26693cbdcaa1b71aa1eb9e4fdf5", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### Create a pipeline\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines` | `POST` | \nCreates a new Delta Live Tables pipeline. \n### Example \nThis example creates a new triggered pipeline. \n#### Request \n```\ncurl --netrc -X POST \\\nhttps:///api/2.0/pipelines \\\n--data @pipeline-settings.json\n\n``` \n`pipeline-settings.json`: \n```\n{\n\"name\": \"Wikipedia pipeline (SQL)\",\n\"storage\": \"/Users/username/data\",\n\"clusters\": [\n{\n\"label\": \"default\",\n\"autoscale\": {\n\"min_workers\": 1,\n\"max_workers\": 5,\n\"mode\": \"ENHANCED\"\n}\n}\n],\n\"libraries\": [\n{\n\"notebook\": {\n\"path\": \"/Users/username/DLT Notebooks/Delta Live Tables quickstart (SQL)\"\n}\n}\n],\n\"continuous\": false\n}\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n#### Response \n```\n{\n\"pipeline_id\": \"a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5\"\n}\n\n``` \n### Request structure \nSee [PipelineSettings](https://docs.databricks.com/delta-live-tables/api-guide.html#pipeline-spec). \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| pipeline\\_id | `STRING` | The unique identifier for the newly created pipeline. |\n\n", "chunk_id": "74fdc842aee2307dd40e4df06fcc3ec9", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### Edit a pipeline\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines/{pipeline_id}` | `PUT` | \nUpdates the settings for an existing pipeline. \n### Example \nThis example adds a `target` parameter to the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`: \n#### Request \n```\ncurl --netrc -X PUT \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5 \\\n--data @pipeline-settings.json\n\n``` \n`pipeline-settings.json` \n```\n{\n\"id\": \"a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5\",\n\"name\": \"Wikipedia pipeline (SQL)\",\n\"storage\": \"/Users/username/data\",\n\"clusters\": [\n{\n\"label\": \"default\",\n\"autoscale\": {\n\"min_workers\": 1,\n\"max_workers\": 5,\n\"mode\": \"ENHANCED\"\n}\n}\n],\n\"libraries\": [\n{\n\"notebook\": {\n\"path\": \"/Users/username/DLT Notebooks/Delta Live Tables quickstart (SQL)\"\n}\n}\n],\n\"target\": \"wikipedia_quickstart_data\",\n\"continuous\": false\n}\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n### Request structure \nSee [PipelineSettings](https://docs.databricks.com/delta-live-tables/api-guide.html#pipeline-spec).\n\n", "chunk_id": "5e227a03b2750532a6832fed2d137e05", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### Delete a pipeline\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines/{pipeline_id}` | `DELETE` | \nDeletes a pipeline from the Delta Live Tables system. \n### Example \nThis example deletes the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`: \n#### Request \n```\ncurl --netrc -X DELETE \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file.\n\n", "chunk_id": "1353a5169a82c624767e744752588388", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### Start a pipeline update\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines/{pipeline_id}/updates` | `POST` | \nStarts an update for a pipeline. You can start an update for the entire pipeline graph, or a selective update of specific tables. \n### Examples \n#### Start a full refresh \nThis example starts an update with full refresh for the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`: \n##### Request \n```\ncurl --netrc -X POST \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5/updates \\\n--data '{ \"full_refresh\": \"true\" }'\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n##### Response \n```\n{\n\"update_id\": \"a1b23c4d-5e6f-78gh-91i2-3j4k5lm67no8\",\n\"request_id\": \"a1b23c4d-5e6f-78gh-91i2-3j4k5lm67no8\"\n}\n\n``` \n#### Start an update of selected tables \nThis example starts an update that refreshes the `sales_orders_cleaned` and `sales_order_in_chicago` tables in the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`: \n##### Request \n```\ncurl --netrc -X POST \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5/updates \\\n--data '{ \"refresh_selection\": [\"sales_orders_cleaned\", \"sales_order_in_chicago\"] }'\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n##### Response \n```\n{\n\"update_id\": \"a1b23c4d-5e6f-78gh-91i2-3j4k5lm67no8\",\n\"request_id\": \"a1b23c4d-5e6f-78gh-91i2-3j4k5lm67no8\"\n}\n\n``` \n#### Start a full update of selected tables \nThis example starts an update of the `sales_orders_cleaned` and `sales_order_in_chicago` tables, and an update with full refresh of the `customers` and `sales_orders_raw` tables in the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`. \n##### Request \n```\ncurl --netrc -X POST \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5/updates \\\n--data '{ \"refresh_selection\": [\"sales_orders_cleaned\", \"sales_order_in_chicago\"], \"full_refresh_selection\": [\"customers\", \"sales_orders_raw\"] }'\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n##### Response \n```\n{\n\"update_id\": \"a1b23c4d-5e6f-78gh-91i2-3j4k5lm67no8\",\n\"request_id\": \"a1b23c4d-5e6f-78gh-91i2-3j4k5lm67no8\"\n}\n\n``` \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `full_refresh` | `BOOLEAN` | Whether to reprocess all data. If `true`, the Delta Live Tables system resets all tables that are resettable before running the pipeline. This field is optional. The default value is `false`. An error is returned if `full_refesh` is true and either `refresh_selection` or `full_refresh_selection` is set. |\n| `refresh_selection` | An array of `STRING` | A list of tables to update. Use `refresh_selection` to start a refresh of a selected set of tables in the pipeline graph. This field is optional. If both `refresh_selection` and `full_refresh_selection` are empty, the entire pipeline graph is refreshed. An error is returned if:* `full_refesh` is true and `refresh_selection` is set. * One or more of the specified tables does not exist in the pipeline graph. |\n| `full_refresh_selection` | An array of `STRING` | A list of tables to update with full refresh. Use `full_refresh_selection` to start an update of a selected set of tables. The states of the specified tables are reset before the Delta Live Tables system starts the update. This field is optional. If both `refresh_selection` and `full_refresh_selection` are empty, the entire pipeline graph is refreshed. An error is returned if:* `full_refesh` is true and `refresh_selection` is set. * One or more of the specified tables does not exist in the pipeline graph. * One or more of the specified tables is not resettable. | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `update_id` | `STRING` | The unique identifier of the newly created update. |\n| `request_id` | `STRING` | The unique identifier of the request that started the update. |\n\n", "chunk_id": "509c7b8fd805b652e0ecf1ed99d64888", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### Get the status of a pipeline update request\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines/{pipeline_id}/requests/{request_id}` | `GET` | \nGets the status and information for the pipeline update associated with `request_id`, where `request_id` is a unique identifier for the request initiating the pipeline update. If the update is retried or restarted, then the new update inherits the request\\_id. \n### Example \nFor the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`, this example returns status and information for the update associated with request ID `a83d9f7c-d798-4fd5-aa39-301b6e6f4429`: \n#### Request \n```\ncurl --netrc -X GET \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5/requests/a83d9f7c-d798-4fd5-aa39-301b6e6f4429\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n#### Response \n```\n{\n\"status\": \"TERMINATED\",\n\"latest_update\":{\n\"pipeline_id\": \"a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5\",\n\"update_id\": \"90da8183-89de-4715-b5a9-c243e67f0093\",\n\"config\":{\n\"id\": \"aae89b88-e97e-40c4-8e1a-1b7ac76657e8\",\n\"name\": \"Retail sales (SQL)\",\n\"storage\": \"/Users/username/data\",\n\"configuration\":{\n\"pipelines.numStreamRetryAttempts\": \"5\"\n},\n\"clusters\":[\n{\n\"label\": \"default\",\n\"autoscale\":{\n\"min_workers\": 1,\n\"max_workers\": 5,\n\"mode\": \"ENHANCED\"\n}\n}\n],\n\"libraries\":[\n{\n\"notebook\":{\n\"path\": \"/Users/username/DLT Notebooks/Delta Live Tables quickstart (SQL)\"\n}\n}\n],\n\"continuous\": false,\n\"development\": true,\n\"photon\": true,\n\"edition\": \"advanced\",\n\"channel\": \"CURRENT\"\n},\n\"cause\": \"API_CALL\",\n\"state\": \"COMPLETED\",\n\"cluster_id\": \"1234-567891-abcde123\",\n\"creation_time\": 1664304117145,\n\"full_refresh\": false,\n\"request_id\": \"a83d9f7c-d798-4fd5-aa39-301b6e6f4429\"\n}\n}\n\n``` \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `status` | `STRING` | The status of the pipeline update request. One of* `ACTIVE`: An update for this request is actively running or may be retried in a new update. * `TERMINATED`: The request is terminated and will not be retried or restarted. |\n| `pipeline_id` | `STRING` | The unique identifier of the pipeline. |\n| `update_id` | `STRING` | The unique identifier of the update. |\n| `config` | [PipelineSettings](https://docs.databricks.com/delta-live-tables/api-guide.html#pipeline-spec) | The pipeline settings. |\n| `cause` | `STRING` | The trigger for the update. One of `API_CALL`, `RETRY_ON_FAILURE`, `SERVICE_UPGRADE`, `SCHEMA_CHANGE`, `JOB_TASK`, or `USER_ACTION`. |\n| `state` | `STRING` | The state of the update. One of `QUEUED`, `CREATED` `WAITING_FOR_RESOURCES`, `INITIALIZING`, `RESETTING`, `SETTING_UP_TABLES`, `RUNNING`, `STOPPING`, `COMPLETED`, `FAILED`, or `CANCELED`. |\n| `cluster_id` | `STRING` | The identifier of the cluster running the update. |\n| `creation_time` | `INT64` | The timestamp when the update was created. |\n| `full_refresh` | `BOOLEAN` | Whether this update resets all tables before running |\n| `refresh_selection` | An array of `STRING` | A list of tables to update without full refresh. |\n| `full_refresh_selection` | An array of `STRING` | A list of tables to update with full refresh. |\n| `request_id` | `STRING` | The unique identifier of the request that started the update. This is the value returned by the [update](https://docs.databricks.com/delta-live-tables/api-guide.html#start-update) request. If the update is retried or restarted, then the new update inherits the request\\_id. However, the `update_id` will be different. |\n\n", "chunk_id": "2b9b4feaa052950b80656d5cbf166775", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### Stop any active pipeline update\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines/{pipeline_id}/stop` | `POST` | \nStops any active pipeline update. If no update is running, this request is a no-op. \nFor a continuous pipeline, the pipeline execution is paused. Tables currently processing finish refreshing, but downstream tables are not refreshed. On the next pipeline update, Delta Live Tables performs a selected refresh of tables that did not complete processing, and resumes processing of the remaining pipeline DAG. \nFor a triggered pipeline, the pipeline execution is stopped. Tables currently processing finish refreshing, but downstream tables are not refreshed. On the next pipeline update, Delta Live Tables refreshes all tables. \n### Example \nThis example stops an update for the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`: \n#### Request \n```\ncurl --netrc -X POST \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5/stop\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file.\n\n", "chunk_id": "efd10dc4dc4196898abd428bd9336de5", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### List pipeline events\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines/{pipeline_id}/events` | `GET` | \nRetrieves events for a pipeline. \n### Example \nThis example retrieves a maximum of 5 events for the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`. \n#### Request \n```\ncurl --netrc -X GET \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5/events?max_results=5\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `page_token` | `STRING` | Page token returned by previous call. This field is mutually exclusive with all fields in this request except max\\_results. An error is returned if any fields other than max\\_results are set when this field is set. This field is optional. |\n| `max_results` | `INT32` | The maximum number of entries to return in a single page. The system may return fewer than `max_results` events in a response, even if there are more events available. This field is optional. The default value is 25. The maximum value is 100. An error is returned if the value of `max_results` is greater than 100. |\n| `order_by` | `STRING` | A string indicating a sort order by timestamp for the results, for example, `[\"timestamp asc\"]`. The sort order can be ascending or descending. By default, events are returned in descending order by timestamp. This field is optional. |\n| `filter` | `STRING` | Criteria to select a subset of results, expressed using a SQL-like syntax. The supported filters are:* `level='INFO'` (or `WARN` or `ERROR`) * `level in ('INFO', 'WARN')` * `id='[event-id]'` * `timestamp > 'TIMESTAMP'` (or `>=`,`<`,`<=`,`=`) Composite expressions are supported, for example: `level in ('ERROR', 'WARN') AND timestamp> '2021-07-22T06:37:33.083Z'` This field is optional. | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `events` | An array of pipeline events. | The list of events matching the request criteria. |\n| `next_page_token` | `STRING` | If present, a token to fetch the next page of events. |\n| `prev_page_token` | `STRING` | If present, a token to fetch the previous page of events. |\n\n", "chunk_id": "217a8d28f0c0b7513f3c6420179afc5f", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### Get pipeline details\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines/{pipeline_id}` | `GET` | \nGets details about a pipeline, including the pipeline settings and recent updates. \n### Example \nThis example gets details for the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`: \n#### Request \n```\ncurl --netrc -X GET \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n#### Response \n```\n{\n\"pipeline_id\": \"a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5\",\n\"spec\": {\n\"id\": \"a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5\",\n\"name\": \"Wikipedia pipeline (SQL)\",\n\"storage\": \"/Users/username/data\",\n\"clusters\": [\n{\n\"label\": \"default\",\n\"autoscale\": {\n\"min_workers\": 1,\n\"max_workers\": 5,\n\"mode\": \"ENHANCED\"\n}\n}\n],\n\"libraries\": [\n{\n\"notebook\": {\n\"path\": \"/Users/username/DLT Notebooks/Delta Live Tables quickstart (SQL)\"\n}\n}\n],\n\"target\": \"wikipedia_quickstart_data\",\n\"continuous\": false\n},\n\"state\": \"IDLE\",\n\"cluster_id\": \"1234-567891-abcde123\",\n\"name\": \"Wikipedia pipeline (SQL)\",\n\"creator_user_name\": \"username\",\n\"latest_updates\": [\n{\n\"update_id\": \"8a0b6d02-fbd0-11eb-9a03-0242ac130003\",\n\"state\": \"COMPLETED\",\n\"creation_time\": \"2021-08-13T00:37:30.279Z\"\n},\n{\n\"update_id\": \"a72c08ba-fbd0-11eb-9a03-0242ac130003\",\n\"state\": \"CANCELED\",\n\"creation_time\": \"2021-08-13T00:35:51.902Z\"\n},\n{\n\"update_id\": \"ac37d924-fbd0-11eb-9a03-0242ac130003\",\n\"state\": \"FAILED\",\n\"creation_time\": \"2021-08-13T00:33:38.565Z\"\n}\n],\n\"run_as_user_name\": \"username\"\n}\n\n``` \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `pipeline_id` | `STRING` | The unique identifier of the pipeline. |\n| `spec` | [PipelineSettings](https://docs.databricks.com/delta-live-tables/api-guide.html#pipeline-spec) | The pipeline settings. |\n| `state` | `STRING` | The state of the pipeline. One of `IDLE` or `RUNNING`. If state = `RUNNING`, then there is at least one active update. |\n| `cluster_id` | `STRING` | The identifier of the cluster running the pipeline. |\n| `name` | `STRING` | The user-friendly name for this pipeline. |\n| `creator_user_name` | `STRING` | The username of the pipeline creator. |\n| `latest_updates` | An array of [UpdateStateInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#update-state-info) | Status of the most recent updates for the pipeline, ordered with the newest update first. |\n| `run_as_user_name` | `STRING` | The username that the pipeline runs as. |\n\n", "chunk_id": "53262ab41cc9655f748ae38409033071", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### Get update details\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines/{pipeline_id}/updates/{update_id}` | `GET` | \nGets details for a pipeline update. \n### Example \nThis example gets details for update `9a84f906-fc51-11eb-9a03-0242ac130003` for the pipeline with ID `a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5`: \n#### Request \n```\ncurl --netrc -X GET \\\nhttps:///api/2.0/pipelines/a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5/updates/9a84f906-fc51-11eb-9a03-0242ac130003\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n#### Response \n```\n{\n\"update\": {\n\"pipeline_id\": \"a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5\",\n\"update_id\": \"9a84f906-fc51-11eb-9a03-0242ac130003\",\n\"config\": {\n\"id\": \"a12cd3e4-0ab1-1abc-1a2b-1a2bcd3e4fg5\",\n\"name\": \"Wikipedia pipeline (SQL)\",\n\"storage\": \"/Users/username/data\",\n\"configuration\": {\n\"pipelines.numStreamRetryAttempts\": \"5\"\n},\n\"clusters\": [\n{\n\"label\": \"default\",\n\"autoscale\": {\n\"min_workers\": 1,\n\"max_workers\": 5,\n\"mode\": \"ENHANCED\"\n}\n}\n],\n\"libraries\": [\n{\n\"notebook\": {\n\"path\": \"/Users/username/DLT Notebooks/Delta Live Tables quickstart (SQL)\"\n}\n}\n],\n\"target\": \"wikipedia_quickstart_data\",\n\"continuous\": false,\n\"development\": false\n},\n\"cause\": \"API_CALL\",\n\"state\": \"COMPLETED\",\n\"creation_time\": 1628815050279,\n\"full_refresh\": true,\n\"request_id\": \"a83d9f7c-d798-4fd5-aa39-301b6e6f4429\"\n}\n}\n\n``` \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `pipeline_id` | `STRING` | The unique identifier of the pipeline. |\n| `update_id` | `STRING` | The unique identifier of this update. |\n| `config` | [PipelineSettings](https://docs.databricks.com/delta-live-tables/api-guide.html#pipeline-spec) | The pipeline settings. |\n| `cause` | `STRING` | The trigger for the update. One of `API_CALL`, `RETRY_ON_FAILURE`, `SERVICE_UPGRADE`. |\n| `state` | `STRING` | The state of the update. One of `QUEUED`, `CREATED` `WAITING_FOR_RESOURCES`, `INITIALIZING`, `RESETTING`, `SETTING_UP_TABLES`, `RUNNING`, `STOPPING`, `COMPLETED`, `FAILED`, or `CANCELED`. |\n| `cluster_id` | `STRING` | The identifier of the cluster running the pipeline. |\n| `creation_time` | `INT64` | The timestamp when the update was created. |\n| `full_refresh` | `BOOLEAN` | Whether this was a full refresh. If true, all pipeline tables were reset before running the update. |\n\n", "chunk_id": "949498b7ac2f25d01f1aec5fa73af855", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### List pipelines\n\n| Endpoint | HTTP Method |\n| --- | --- |\n| `2.0/pipelines/` | `GET` | \nLists pipelines defined in the Delta Live Tables system. \n### Example \nThis example retrieves details for pipelines where the name contains `quickstart`: \n#### Request \n```\ncurl --netrc -X GET \\\nhttps:///api/2.0/pipelines?filter=name%20LIKE%20%27%25quickstart%25%27\n\n``` \nReplace: \n* `` with the Databricks [workspace instance name](https://docs.databricks.com/workspace/workspace-details.html#workspace-url), for example `dbc-a1b2345c-d6e7.cloud.databricks.com`. \nThis example uses a [.netrc](https://everything.curl.dev/usingcurl/netrc) file. \n#### Response \n```\n{\n\"statuses\": [\n{\n\"pipeline_id\": \"e0f01758-fc61-11eb-9a03-0242ac130003\",\n\"state\": \"IDLE\",\n\"name\": \"DLT quickstart (Python)\",\n\"latest_updates\": [\n{\n\"update_id\": \"ee9ae73e-fc61-11eb-9a03-0242ac130003\",\n\"state\": \"COMPLETED\",\n\"creation_time\": \"2021-08-13T00:34:21.871Z\"\n}\n],\n\"creator_user_name\": \"username\"\n},\n{\n\"pipeline_id\": \"f4c82f5e-fc61-11eb-9a03-0242ac130003\",\n\"state\": \"IDLE\",\n\"name\": \"My DLT quickstart example\",\n\"creator_user_name\": \"username\"\n}\n],\n\"next_page_token\": \"eyJ...==\",\n\"prev_page_token\": \"eyJ..x9\"\n}\n\n``` \n### Request structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `page_token` | `STRING` | Page token returned by previous call. This field is optional. |\n| `max_results` | `INT32` | The maximum number of entries to return in a single page. The system may return fewer than `max_results` events in a response, even if there are more events available. This field is optional. The default value is 25. The maximum value is 100. An error is returned if the value of `max_results` is greater than 100. |\n| `order_by` | An array of `STRING` | A list of strings specifying the order of results, for example, `[\"name asc\"]`. Supported `order_by` fields are `id` and `name`. The default is `id asc`. This field is optional. |\n| `filter` | `STRING` | Select a subset of results based on the specified criteria. The supported filters are: `\"notebook=''\"` to select pipelines that reference the provided notebook path. `name LIKE '[pattern]'` to select pipelines with a name that matches `pattern`. Wildcards are supported, for example: `name LIKE '%shopping%'` Composite filters are not supported. This field is optional. | \n### Response structure \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `statuses` | An array of [PipelineStateInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#pipeline-state-info) | The list of events matching the request criteria. |\n| `next_page_token` | `STRING` | If present, a token to fetch the next page of events. |\n| `prev_page_token` | `STRING` | If present, a token to fetch the previous page of events. |\n\n", "chunk_id": "2882829eb05b63fde1320985dbcff4d4", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks reference documentation\n### Delta Live Tables API guide\n#### Data structures\n\nIn this section: \n* [AwsAttributes](https://docs.databricks.com/delta-live-tables/api-guide.html#awsattributes)\n* [AwsAvailability](https://docs.databricks.com/delta-live-tables/api-guide.html#awsavailability)\n* [ClusterLogConf](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterlogconf)\n* [DbfsStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#dbfsstorageinfo)\n* [EbsVolumeType](https://docs.databricks.com/delta-live-tables/api-guide.html#ebsvolumetype)\n* [FileStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#filestorageinfo)\n* [InitScriptInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#initscriptinfo)\n* [KeyValue](https://docs.databricks.com/delta-live-tables/api-guide.html#keyvalue)\n* [NotebookLibrary](https://docs.databricks.com/delta-live-tables/api-guide.html#notebooklibrary)\n* [PipelinesAutoScale](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelinesautoscale)\n* [PipelineLibrary](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelinelibrary)\n* [PipelinesNewCluster](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelinesnewcluster)\n* [PipelineSettings](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelinesettings)\n* [PipelineStateInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelinestateinfo)\n* [S3StorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#s3storageinfo)\n* [UpdateStateInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#updatestateinfo)\n* [WorkspaceStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#workspacestorageinfo) \n### [AwsAttributes](https://docs.databricks.com/delta-live-tables/api-guide.html#id37) \nAttributes set during cluster creation related to Amazon Web Services. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `first_on_demand` | `INT32` | The first first\\_on\\_demand nodes of the cluster will be placed on on-demand instances. If this value is greater than 0, the cluster driver node will be placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first\\_on\\_demand nodes will be placed on on-demand instances and the remainder will be placed on `availability` instances. This value does not affect cluster size and cannot be mutated over the lifetime of a cluster. |\n| `availability` | [AwsAvailability](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterawsavailability) | Availability type used for all subsequent nodes past the first\\_on\\_demand ones. **Note:** If first\\_on\\_demand is zero, this availability type will be used for the entire cluster. |\n| `zone_id` | `STRING` | Identifier for the availability zone (AZ) in which the cluster resides. By default, the setting has a value of **auto**, otherwise known as Auto-AZ. With Auto-AZ, Databricks selects the AZ based on available IPs in the workspace subnets and retries in other availability zones if AWS returns insufficient capacity errors. If you want, you can also specify an availability zone to use. This benefits accounts that have reserved instances in a specific AZ. Specify the AZ as a string (for example, `\"us-west-2a\"`). The provided availability zone must be in the same region as the Databricks deployment. For example, \u201cus-west-2a\u201d is not a valid zone ID if the Databricks deployment resides in the \u201cus-east-1\u201d region. The list of available zones as well as the default value can be found by using the [GET /api/2.0/clusters/list-zones](https://docs.databricks.com/api/workspace/clusters/listzones) call. |\n| `instance_profile_arn` | `STRING` | Nodes for this cluster will only be placed on AWS instances with this instance profile. If omitted, nodes will be placed on instances without an instance profile. The instance profile must have previously been added to the Databricks environment by an account administrator. This feature may only be available to certain customer plans. |\n| `spot_bid_price_percent` | `INT32` | The max price for AWS spot instances, as a percentage of the corresponding instance type\u2019s on-demand price. For example, if this field is set to 50, and the cluster needs a new `i3.xlarge` spot instance, then the max price is half of the price of on-demand `i3.xlarge` instances. Similarly, if this field is set to 200, the max price is twice the price of on-demand `i3.xlarge` instances. If not specified, the default value is 100. When spot instances are requested for this cluster, only spot instances whose max price percentage matches this field will be considered. For safety, we enforce this field to be no more than 10000. |\n| `ebs_volume_type` | [EbsVolumeType](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterebsvolumetype) | The type of EBS volumes that will be launched with this cluster. |\n| `ebs_volume_count` | `INT32` | The number of volumes launched for each instance. You can choose up to 10 volumes. This feature is only enabled for supported node types. Legacy node types cannot specify custom EBS volumes. For node types with no instance store, at least one EBS volume needs to be specified; otherwise, cluster creation will fail. These EBS volumes will be mounted at `/ebs0`, `/ebs1`, and etc. Instance store volumes will be mounted at `/local_disk0`, `/local_disk1`, and etc. If EBS volumes are attached, Databricks will configure Spark to use only the EBS volumes for scratch storage because heterogeneously sized scratch devices can lead to inefficient disk utilization. If no EBS volumes are attached, Databricks will configure Spark to use instance store volumes. If EBS volumes are specified, then the Spark configuration `spark.local.dir` will be overridden. |\n| `ebs_volume_size` | `INT32` | The size of each EBS volume (in GiB) launched for each instance. For general purpose SSD, this value must be within the range 100 - 4096. For throughput optimized HDD, this value must be within the range 500 - 4096. Custom EBS volumes cannot be specified for the legacy node types (*memory-optimized* and *compute-optimized*). |\n| `ebs_volume_iops` | `INT32` | The number of IOPS per EBS gp3 volume. This value must be between 3000 and 16000. The value of IOPS and throughput is calculated based on AWS documentation to match the maximum performance of a gp2 volume with the same volume size. For more information, see the [EBS volume limit calculator](https://github.com/awslabs/aws-support-tools/tree/master/EBS/VolumeLimitCalculator). |\n| `ebs_volume_throughput` | `INT32` | The throughput per EBS gp3 volume, in MiB per second. This value must be between 125 and 1000. | \nIf neither `ebs_volume_iops` nor `ebs_volume_throughput` is specified, the values are inferred from the disk size: \n| Disk size | IOPS | Throughput |\n| --- | --- | --- |\n| Greater than 1000 | 3 times the disk size, up to 16000 | 250 |\n| Between 170 and 1000 | 3000 | 250 |\n| Below 170 | 3000 | 125 | \n### [AwsAvailability](https://docs.databricks.com/delta-live-tables/api-guide.html#id38) \nThe set of AWS availability types supported when setting up nodes for a cluster. \n| Type | Description |\n| --- | --- |\n| `SPOT` | Use spot instances. |\n| `ON_DEMAND` | Use on-demand instances. |\n| `SPOT_WITH_FALLBACK` | Preferably use spot instances, but fall back to on-demand instances if spot instances cannot be acquired (for example, if AWS spot prices are too high). | \n### [ClusterLogConf](https://docs.databricks.com/delta-live-tables/api-guide.html#id39) \nPath to cluster log. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `dbfs` OR `s3` | [DbfsStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterclusterlogconfdbfsstorageinfo) [S3StorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterclusterinitscriptinfos3storageinfo) | DBFS location of cluster log. Destination must be provided. For example, `{ \"dbfs\" : { \"destination\" : \"dbfs:/home/cluster_log\" } }` S3 location of cluster log. `destination` and either `region` or `warehouse` must be provided. For example, `{ \"s3\": { \"destination\" : \"s3://cluster_log_bucket/prefix\", \"region\" : \"us-west-2\" } }` | \n### [DbfsStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#id40) \nDBFS storage information. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `destination` | `STRING` | DBFS destination. Example: `dbfs:/my/path` | \n### [EbsVolumeType](https://docs.databricks.com/delta-live-tables/api-guide.html#id41) \nDatabricks supports gp2 and gp3 EBS volume types. Follow the instructions at [Manage SSD storage](https://docs.databricks.com/admin/clusters/manage-ssd.html) to select gp2 or gp3 for your workspace. \n| Type | Description |\n| --- | --- |\n| `GENERAL_PURPOSE_SSD` | Provision extra storage using AWS EBS volumes. |\n| `THROUGHPUT_OPTIMIZED_HDD` | Provision extra storage using AWS st1 volumes. | \n### [FileStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#id42) \nFile storage information. \nNote \nThis location type is only available for clusters set up using [Databricks Container Services](https://docs.databricks.com/compute/custom-containers.html). \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `destination` | `STRING` | File destination. Example: `file:/my/file.sh` | \n### [InitScriptInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#id43) \nPath to an init script. \nFor instructions on using init scripts with [Databricks Container Services](https://docs.databricks.com/compute/custom-containers.html), see [Use an init script](https://docs.databricks.com/compute/custom-containers.html#containers-init-script). \nNote \nThe file storage type (field name: `file`) is only available for clusters set up using [Databricks Container Services](https://docs.databricks.com/compute/custom-containers.html). See [FileStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterclusterinitscriptinfofilestorageinfo). \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `workspace` OR `dbfs` (deprecated) OR `S3` | [WorkspaceStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterclusterinitscriptinfoworkspacestorageinfo) [DbfsStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterclusterlogconfdbfsstorageinfo) (deprecated) [S3StorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterclusterinitscriptinfos3storageinfo) | Workspace location of init script. Destination must be provided. For example, `{ \"workspace\" : { \"destination\" : \"/Users/someone@domain.com/init_script.sh\" } }` (Deprecated) DBFS location of init script. Destination must be provided. For example, `{ \"dbfs\" : { \"destination\" : \"dbfs:/home/init_script\" } }` S3 location of init script. Destination and either region or warehouse must be provided. For example, `{ \"s3\": { \"destination\" : \"s3://init_script_bucket/prefix\", \"region\" : \"us-west-2\" } }` | \n### [KeyValue](https://docs.databricks.com/delta-live-tables/api-guide.html#id44) \nA key-value pair that specifies configuration parameters. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `key` | `STRING` | The configuration property name. |\n| `value` | `STRING` | The configuration property value. | \n### [NotebookLibrary](https://docs.databricks.com/delta-live-tables/api-guide.html#id45) \nA specification for a notebook containing pipeline code. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `path` | `STRING` | The absolute path to the notebook. This field is required. | \n### [PipelinesAutoScale](https://docs.databricks.com/delta-live-tables/api-guide.html#id46) \nAttributes defining an autoscaling cluster. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `min_workers` | `INT32` | The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation. |\n| `max_workers` | `INT32` | The maximum number of workers to which the cluster can scale up when overloaded. max\\_workers must be strictly greater than min\\_workers. |\n| `mode` | `STRING` | The autoscaling mode for the cluster:* `ENHANCED` to use [enhanced autoscaling](https://docs.databricks.com/delta-live-tables/auto-scaling.html). * `LEGACY` to use the [cluster autoscaling functionality](https://docs.databricks.com/compute/configure.html#autoscaling). | \n### [PipelineLibrary](https://docs.databricks.com/delta-live-tables/api-guide.html#id47) \nA specification for pipeline dependencies. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `notebook` | [NotebookLibrary](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelines-notebook-library) | The path to a notebook defining Delta Live Tables datasets. The path must be in the Databricks workspace, for example: `{ \"notebook\" : { \"path\" : \"/my-pipeline-notebook-path\" } }`. | \n### [PipelinesNewCluster](https://docs.databricks.com/delta-live-tables/api-guide.html#id48) \nA pipeline cluster specification. \nThe Delta Live Tables system sets the following attributes. These attributes cannot be configured by users: \n* `spark_version` \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `label` | `STRING` | A label for the cluster specification, either `default` to configure the default cluster, or `maintenance` to configure the maintenance cluster. This field is optional. The default value is `default`. |\n| `spark_conf` | [KeyValue](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelines-keyvalue) | An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via `spark.driver.extraJavaOptions` and `spark.executor.extraJavaOptions` respectively. Example Spark confs: `{\"spark.speculation\": true, \"spark.streaming.ui.retainedBatches\": 5}` or `{\"spark.driver.extraJavaOptions\": \"-verbose:gc -XX:+PrintGCDetails\"}` |\n| `aws_attributes` | [AwsAttributes](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterawsattributes) | Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used. |\n| `node_type_id` | `STRING` | This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads A list of available node types can be retrieved by using the [GET 2.0/clusters/list-node-types](https://docs.databricks.com/api/workspace/clusters/listnodetypes) call. |\n| `driver_node_type_id` | `STRING` | The node type of the Spark driver. This field is optional; if unset, the driver node type will be set as the same value as `node_type_id` defined above. |\n| `ssh_public_keys` | An array of `STRING` | SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name `ubuntu` on port `2200`. Up to 10 keys can be specified. |\n| `custom_tags` | [KeyValue](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelines-keyvalue) | An object containing a set of tags for cluster resources. Databricks tags all cluster resources with these tags in addition to default\\_tags. **Note**:* Tags are not supported on legacy node types such as compute-optimized and memory-optimized * Databricks allows at most 45 custom tags. |\n| `cluster_log_conf` | [ClusterLogConf](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterclusterlogconf) | The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If this configuration is provided, the logs will be delivered to the destination every `5 mins`. The destination of driver logs is `//driver`, while the destination of executor logs is `//executor`. |\n| `spark_env_vars` | [KeyValue](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelines-keyvalue) | An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs of the form (X,Y) are exported as is (that is, `export X='Y'`) while launching the driver and workers. In order to specify an additional set of `SPARK_DAEMON_JAVA_OPTS`, Databricks recommends appending them to `$SPARK_DAEMON_JAVA_OPTS` as shown in the following example. This ensures that all default Databricks managed environmental variables are included as well. Example Spark environment variables: `{\"SPARK_WORKER_MEMORY\": \"28000m\", \"SPARK_LOCAL_DIRS\": \"/local_disk0\"}` or `{\"SPARK_DAEMON_JAVA_OPTS\": \"$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true\"}` |\n| `init_scripts` | An array of [InitScriptInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#clusterclusterinitscriptinfo) | The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If `cluster_log_conf` is specified, init script logs are sent to `//init_scripts`. |\n| `instance_pool_id` | `STRING` | The optional ID of the instance pool to which the cluster belongs. See [Pool configuration reference](https://docs.databricks.com/compute/pools.html). |\n| `driver_instance_pool_id` | `STRING` | The optional ID of the instance pool to use for the driver node. You must also specify `instance_pool_id`. See [Instance Pools API](https://docs.databricks.com/api/workspace/instancepools). |\n| `policy_id` | `STRING` | A [cluster policy](https://docs.databricks.com/api/workspace/clusterpolicies) ID. |\n| `num_workers OR autoscale` | `INT32` OR [InitScriptInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelines-autoscale) | If num\\_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num\\_workers executors for a total of num\\_workers + 1 Spark nodes. When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field is updated to reflect the target size of 10 workers, whereas the workers listed in executors gradually increase from 5 to 10 as the new nodes are provisioned. If autoscale, parameters needed to automatically scale clusters up and down based on load. This field is optional. |\n| `apply_policy_default_values` | `BOOLEAN` | Whether to use [policy](https://docs.databricks.com/api/workspace/clusterpolicies) default values for missing cluster attributes. | \n### [PipelineSettings](https://docs.databricks.com/delta-live-tables/api-guide.html#id49) \nThe settings for a pipeline deployment. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `id` | `STRING` | The unique identifier for this pipeline. The identifier is created by the Delta Live Tables system, and must not be provided when creating a pipeline. |\n| `name` | `STRING` | A user-friendly name for this pipeline. This field is optional. By default, the pipeline name must be unique. To use a duplicate name, set `allow_duplicate_names` to `true` in the pipeline configuration. |\n| `storage` | `STRING` | A path to a DBFS directory for storing checkpoints and tables created by the pipeline. This field is optional. The system uses a default location if this field is empty. |\n| `configuration` | A map of `STRING:STRING` | A list of key-value pairs to add to the Spark configuration of the cluster that will run the pipeline. This field is optional. Elements must be formatted as key:value pairs. |\n| `clusters` | An array of [PipelinesNewCluster](https://docs.databricks.com/delta-live-tables/api-guide.html#pipelines-new-cluster) | An array of specifications for the clusters to run the pipeline. This field is optional. If this is not specified, the system will select a default cluster configuration for the pipeline. |\n| `libraries` | An array of [PipelineLibrary](https://docs.databricks.com/delta-live-tables/api-guide.html#pipeline-library) | The notebooks containing the pipeline code and any dependencies required to run the pipeline. |\n| `target` | `STRING` | A database name for persisting pipeline output data. See [Publish data from Delta Live Tables to the Hive metastore](https://docs.databricks.com/delta-live-tables/publish.html) for more information. |\n| `continuous` | `BOOLEAN` | Whether this is a continuous pipeline. This field is optional. The default value is `false`. |\n| `development` | `BOOLEAN` | Whether to run the pipeline in development mode. This field is optional. The default value is `false`. |\n| `photon` | `BOOLEAN` | Whether Photon acceleration is enabled for this pipeline. This field is optional. The default value is `false`. |\n| `channel` | `STRING` | The Delta Live Tables release channel specifying the runtime version to use for this pipeline. Supported values are:* `preview` to test the pipeline with upcoming changes to the Delta Live Tables runtime. * `current` to use the current Delta Live Tables runtime version. This field is optional. The default value is `current`. |\n| `edition` | `STRING` | The Delta Live Tables product edition to run the pipeline:* `CORE` supports streaming ingest workloads. * `PRO` also supports streaming ingest workloads and adds support for change data capture (CDC) processing. * `ADVANCED` supports all the features of the `PRO` edition and adds support for workloads that require Delta Live Tables expectations to enforce data quality constraints. This field is optional. The default value is `advanced`. | \n### [PipelineStateInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#id50) \nThe state of a pipeline, the status of the most recent updates, and information about associated resources. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `state` | `STRING` | The state of the pipeline. One of `IDLE` or `RUNNING`. |\n| `pipeline_id` | `STRING` | The unique identifier of the pipeline. |\n| `cluster_id` | `STRING` | The unique identifier of the cluster running the pipeline. |\n| `name` | `STRING` | The user-friendly name of the pipeline. |\n| `latest_updates` | An array of [UpdateStateInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#update-state-info) | Status of the most recent updates for the pipeline, ordered with the newest update first. |\n| `creator_user_name` | `STRING` | The username of the pipeline creator. |\n| `run_as_user_name` | `STRING` | The username that the pipeline runs as. This is a read only value derived from the pipeline owner. | \n### [S3StorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#id51) \nS3 storage information. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `destination` | `STRING` | S3 destination. For example: `s3://my-bucket/some-prefix` You must configure the cluster with an instance profile and the instance profile must have write access to the destination. You *cannot* use AWS keys. |\n| `region` | `STRING` | S3 region. For example: `us-west-2`. Either region or warehouse must be set. If both are set, warehouse is used. |\n| `warehouse` | `STRING` | S3 warehouse. For example: `https://s3-us-west-2.amazonaws.com`. Either region or warehouse must be set. If both are set, warehouse is used. |\n| `enable_encryption` | `BOOL` | (Optional)Enable server side encryption, `false` by default. |\n| `encryption_type` | `STRING` | (Optional) The encryption type, it could be `sse-s3` or `sse-kms`. It is used only when encryption is enabled and the default type is `sse-s3`. |\n| `kms_key` | `STRING` | (Optional) KMS key used if encryption is enabled and encryption type is set to `sse-kms`. |\n| `canned_acl` | `STRING` | (Optional) Set canned access control list. For example: `bucket-owner-full-control`. If canned\\_acl is set, the cluster instance profile must have `s3:PutObjectAcl` permission on the destination bucket and prefix. The full list of possible canned ACLs can be found at . By default only the object owner gets full control. If you are using cross account role for writing data, you may want to set `bucket-owner-full-control` to make bucket owner able to read the logs. | \n### [UpdateStateInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#id52) \nThe current state of a pipeline update. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `update_id` | `STRING` | The unique identifier for this update. |\n| `state` | `STRING` | The state of the update. One of `QUEUED`, `CREATED`, `WAITING_FOR_RESOURCES`, `INITIALIZING`, `RESETTING`, `SETTING_UP_TABLES`, `RUNNING`, `STOPPING`, `COMPLETED`, `FAILED`, or `CANCELED`. |\n| `creation_time` | `STRING` | Timestamp when this update was created. | \n### [WorkspaceStorageInfo](https://docs.databricks.com/delta-live-tables/api-guide.html#id53) \nWorkspace storage information. \n| Field Name | Type | Description |\n| --- | --- | --- |\n| `destination` | `STRING` | File destination. Example: `/Users/someone@domain.com/init_script.sh` |\n\n", "chunk_id": "85087f9d74d4b73dec8c38d113724e69", "url": "https://docs.databricks.com/delta-live-tables/api-guide.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 3.5 LTS (unsupported)\n\nDatabricks released this image in December 2017. It was declared Long Term Support (LTS) in January 2018. Support ended on January 2, 2020. \nThe following release notes provide information about Databricks Runtime 3.5, powered by Apache Spark.\n\n#### Databricks Runtime 3.5 LTS (unsupported)\n##### Changes and improvements\n\n* Databricks Runtime 3.5 includes a number of performance and usability enhancements to Databricks Delta, which is in private preview. See [What is Delta Lake?](https://docs.databricks.com/delta/index.html).\n* Added `repartitionByRange` to Dataset API.\n* Added \u201cHeap Histogram\u201d to Spark UI\u2019s Executors page for driver and executors. A heap histogram shows the total size and instance count for each class in the heap, which allows uses to easily understand how different kinds of classes consume heap memory of a driver or an executor. \n![Heap histogram all](https://docs.databricks.com/_images/jmap-1.png) \n![Heap histogram executor](https://docs.databricks.com/_images/jmap-2.png) \n* Improved window function performance. \n* R is available on serverless pools as a beta feature. Contact your Databricks account team to be included in the beta program. \n* Upgraded Python library setuptools from 36.6.0 to 38.2.3.\n* Upgraded several pre-installed R libraries. For details, see [Pre-installed R Libraries](https://docs.databricks.com/archive/runtime-release-notes/3.5.html#pre-installed-r-libraries).\n* Bug fixes and stability improvements.\n\n", "chunk_id": "c08fca4bdf639678ebdb3187e68df539", "url": "https://docs.databricks.com/archive/runtime-release-notes/3.5.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 3.5 LTS (unsupported)\n##### Apache Spark\n\nDatabricks Runtime 3.5 includes Apache Spark 2.2.1. This release includes all fixes and improvements included in [Databricks Runtime 3.4 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/3.4.html), as well as the following additional bug fixes and improvements made to Spark: \n* [[SPARK-20557]](https://issues.apache.org/jira/browse/SPARK-20557)[SQL] Only support TIMESTAMP WITH TIME ZONE for Oracle Dialect\n* [[SPARK-21652]](https://issues.apache.org/jira/browse/SPARK-21652)[SQL] Fix rule conflict between InferFiltersFromConstraints and the other Filter Reduction Rules\n* [[SPARK-22721]](https://issues.apache.org/jira/browse/SPARK-22721) BytesToBytesMap peak memory not updated.\n* [[SPARK-22431]](https://issues.apache.org/jira/browse/SPARK-22431)[SQL] Ensure that the datatype in the schema for the table/view metadata is parseable by Spark before persisting it\n* [[SPARK-22686]](https://issues.apache.org/jira/browse/SPARK-22686)[SQL] DROP TABLE IF EXISTS should not show AnalysisException\n* [[SPARK-21652]](https://issues.apache.org/jira/browse/SPARK-21652)[SQL] Fix rule conflict between InferFiltersFromConstraints and ConstantPropagation\n* [[SPARK-22162]](https://issues.apache.org/jira/browse/SPARK-22162) Executors and the driver should use consistent JobIDs in the RDD commit protocol\n* [[SPARK-22635]](https://issues.apache.org/jira/browse/SPARK-22635)[SQL][ORC] FileNotFoundException while reading ORC files containing special characters\n* [[SPARK-22601]](https://issues.apache.org/jira/browse/SPARK-22601)[SQL] Data load is getting displayed successful on providing non existing nonlocal file path\n* [[SPARK-22653]](https://issues.apache.org/jira/browse/SPARK-22653) executorAddress registered in CoarseGrainedSchedulerBac\u2026\n* [[SPARK-22373]](https://issues.apache.org/jira/browse/SPARK-22373) Bump Janino dependency version to fix thread safety issue\u2026\n* [[SPARK-22637]](https://issues.apache.org/jira/browse/SPARK-22637)[SQL] Only refresh a logical plan once.\n* [[SPARK-22603]](https://issues.apache.org/jira/browse/SPARK-22603)[SQL] Fix 64KB JVM bytecode limit problem with FormatString\n* [[SPARK-22595]](https://issues.apache.org/jira/browse/SPARK-22595)[SQL] fix flaky test: CastSuite.SPARK-22500: cast for struct should not generate codes beyond 64KB\n* [[SPARK-22591]](https://issues.apache.org/jira/browse/SPARK-22591)[SQL] GenerateOrdering shouldn\u2019t change CodegenContext.INPUT\\_ROW\n* [[SPARK-17920]](https://issues.apache.org/jira/browse/SPARK-17920)[[SPARK-19580]](https://issues.apache.org/jira/browse/SPARK-19580)[[SPARK-19878]](https://issues.apache.org/jira/browse/SPARK-19878)[SQL] Support writing to Hive table which uses Avro schema url \u2018avro.schema.url\u2019\n* [[SPARK-22548]](https://issues.apache.org/jira/browse/SPARK-22548)[SQL] Incorrect nested AND expression pushed down to JDBC data source\n* [[SPARK-22500]](https://issues.apache.org/jira/browse/SPARK-22500)[SQL] Fix 64KB JVM bytecode limit problem with cast\n* [[SPARK-22550]](https://issues.apache.org/jira/browse/SPARK-22550)[SQL] Fix 64KB JVM bytecode limit problem with elt\n* [[SPARK-22508]](https://issues.apache.org/jira/browse/SPARK-22508)[SQL] Fix 64KB JVM bytecode limit problem with GenerateUnsafeRowJoiner.create()\n* [[SPARK-22549]](https://issues.apache.org/jira/browse/SPARK-22549)[SQL] Fix 64KB JVM bytecode limit problem with concat\\_ws\n* [[SPARK-22498]](https://issues.apache.org/jira/browse/SPARK-22498)[SQL] Fix 64KB JVM bytecode limit problem with concat\n* [[SPARK-22544]](https://issues.apache.org/jira/browse/SPARK-22544)[SS] FileStreamSource should use its own hadoop conf to call globPathIfNecessary\n* [[SPARK-22538]](https://issues.apache.org/jira/browse/SPARK-22538)[ML] SQLTransformer should not unpersist possibly cached input dataset\n* [[SPARK-22540]](https://issues.apache.org/jira/browse/SPARK-22540)[SQL] Ensure HighlyCompressedMapStatus calculates correct avgSize\n* [[SPARK-22535]](https://issues.apache.org/jira/browse/SPARK-22535)[PYSPARK] Sleep before killing the python worker in PythRunner.MonitorThread (branch-2.2)\n* [[SPARK-22501]](https://issues.apache.org/jira/browse/SPARK-22501)[SQL] Fix 64KB JVM bytecode limit problem with in\n* [[SPARK-22494]](https://issues.apache.org/jira/browse/SPARK-22494)[SQL] Fix 64KB limit exception with Coalesce and AtleastNNonNulls\n* [[SPARK-22499]](https://issues.apache.org/jira/browse/SPARK-22499)[SQL] Fix 64KB JVM bytecode limit problem with least and greatest\n* [[SPARK-22479]](https://issues.apache.org/jira/browse/SPARK-22479)[SQL] Exclude credentials from SaveintoDataSourceCommand.simpleString\n* [[SPARK-22469]](https://issues.apache.org/jira/browse/SPARK-22469)[SQL] Accuracy problem in comparison with string and numeric\n* [[SPARK-22721]](https://issues.apache.org/jira/browse/SPARK-22721) BytesToBytesMap peak memory usage not accurate after reset()\n* [[SPARK-22614]](https://issues.apache.org/jira/browse/SPARK-22614) Dataset API: repartitionByRange(\u2026)\n* [[SPARK-22471]](https://issues.apache.org/jira/browse/SPARK-22471)[SQL] SQLListener consumes much memory causing OutOfMemoryError\n* [[SPARK-22442]](https://issues.apache.org/jira/browse/SPARK-22442)[SQL] ScalaReflection should produce correct field names for special characters\n* [[SPARK-21694]](https://issues.apache.org/jira/browse/SPARK-21694)[R][ML] Reduce max iterations in Linear SVM test in R to speed up AppVeyor build\n* [[SPARK-22464]](https://issues.apache.org/jira/browse/SPARK-22464)[SQL] No pushdown for Hive metastore partition predicates containing null-safe equality\n* [[SPARK-22488]](https://issues.apache.org/jira/browse/SPARK-22488)[SQL] Fix the view resolution issue in the SparkSession internal table() API\n* [[SPARK-21720]](https://issues.apache.org/jira/browse/SPARK-21720)[SQL] Fix 64KB JVM bytecode limit problem with AND or OR\n* [[SPARK-21667]](https://issues.apache.org/jira/browse/SPARK-21667)[STREAMING] ConsoleSink should not fail streaming query with checkpointLocation option\n* [[SPARK-19644]](https://issues.apache.org/jira/browse/SPARK-19644)[SQL] Clean up Scala reflection garbage after creating Encoder (branch-2.2)\n* [[SPARK-22284]](https://issues.apache.org/jira/browse/SPARK-22284)[SQL] Fix 64KB JVM bytecode limit problem in calculating hash for nested structs\n* [[SPARK-22243]](https://issues.apache.org/jira/browse/SPARK-22243)[DSTREAM] spark.yarn.jars should reload from config when checkpoint recovery\n* [[SPARK-22344]](https://issues.apache.org/jira/browse/SPARK-22344)[SPARKR] clean up install dir if running test as source package\n* [[SPARK-22472]](https://issues.apache.org/jira/browse/SPARK-22472)[SQL] add null check for top-level primitive values\n* [[SPARK-22403]](https://issues.apache.org/jira/browse/SPARK-22403)[SS] Add optional checkpointLocation argument to StructuredKafkaWordCount example\n* [[SPARK-22281]](https://issues.apache.org/jira/browse/SPARK-22281)[SPARKR] Handle R method breaking signature changes\n* [[SPARK-22417]](https://issues.apache.org/jira/browse/SPARK-22417)[PYTHON] Fix for createDataFrame from pandas.DataFrame with timestamp\n* [[SPARK-22315]](https://issues.apache.org/jira/browse/SPARK-22315)[SPARKR] Warn if SparkR package version doesn\u2019t match SparkContext\n* [[SPARK-22429]](https://issues.apache.org/jira/browse/SPARK-22429)[STREAMING] Streaming checkpointing code does not retry after failure\n* [[SPARK-22211]](https://issues.apache.org/jira/browse/SPARK-22211)[SQL] Remove incorrect FOJ limit pushdown\n* [[SPARK-22306]](https://issues.apache.org/jira/browse/SPARK-22306)[SQL] alter table schema should not erase the bucketing metadata at hive side\n* [[SPARK-22333]](https://issues.apache.org/jira/browse/SPARK-22333)[SQL] timeFunctionCall(CURRENT\\_DATE, CURRENT\\_TIMESTAMP) has conflicts with columnReference\n* [[SPARK-19611]](https://issues.apache.org/jira/browse/SPARK-19611)[SQL] set dataSchema correctly in HiveMetastoreCatalog.convertToLogicalRelation\n* [[SPARK-22291]](https://issues.apache.org/jira/browse/SPARK-22291)[SQL] Conversion error when transforming array types of uuid, inet and cidr to StingType in PostgreSQL\n* [[SPARK-22344]](https://issues.apache.org/jira/browse/SPARK-22344)[SPARKR] Set java.io.tmpdir for SparkR tests\n* [[SPARK-19727]](https://issues.apache.org/jira/browse/SPARK-19727)[SQL] Fix for round function that modifies original column\n* [[SPARK-22445]](https://issues.apache.org/jira/browse/SPARK-22445)[SQL] Respect stream-side child\u2019s needCopyResult in BroadcastHashJoin\n* [[SPARK-22542]](https://issues.apache.org/jira/browse/SPARK-22542)[SQL] remove unused features in ColumnarBatch\n* [[SPARK-17310]](https://issues.apache.org/jira/browse/SPARK-17310) Add an option to disable record-level filter in Parquet-side\n* [[SPARK-22514]](https://issues.apache.org/jira/browse/SPARK-22514)[SQL] move ColumnVector.Array and ColumnarBatch.Row to individual files\n* [[SPARK-22222]](https://issues.apache.org/jira/browse/SPARK-22222)[[SPARK-22033]](https://issues.apache.org/jira/browse/SPARK-22033) Fix the ARRAY\\_MAX in BufferHolder and add a test\n* [[SPARK-10365]](https://issues.apache.org/jira/browse/SPARK-10365)[SQL] Support Parquet logical type TIMESTAMP\\_MICROS\n* [[SPARK-22356]](https://issues.apache.org/jira/browse/SPARK-22356)[SQL] data source table should support overlapped columns between data and partition schema\n* [[SPARK-22355]](https://issues.apache.org/jira/browse/SPARK-22355)[SQL] Dataset.collect is not threadsafe\n* [[SPARK-22328]](https://issues.apache.org/jira/browse/SPARK-22328)[CORE] ClosureCleaner should not miss referenced superclass fields\n* [[SPARK-17902]](https://issues.apache.org/jira/browse/SPARK-17902)[R] Revive stringsAsFactors option for collect() in SparkR\n* [[SPARK-22227]](https://issues.apache.org/jira/browse/SPARK-22227)[CORE] DiskBlockManager.getAllBlocks now tolerates temp files\n* [[SPARK-22319]](https://issues.apache.org/jira/browse/SPARK-22319)[CORE] call loginUserFromKeytab before accessing hdfs\n* [[SPARK-21551]](https://issues.apache.org/jira/browse/SPARK-21551)[PYTHON] Increase timeout for PythonRDD.serveIterator\n* [[SPARK-22249]](https://issues.apache.org/jira/browse/SPARK-22249)[SQL] Check if list of value for IN is empty in the optimizer\n* [[SPARK-22271]](https://issues.apache.org/jira/browse/SPARK-22271)[SQL] mean overflows and returns null for some decimal variables\n* [[SPARK-22249]](https://issues.apache.org/jira/browse/SPARK-22249)[SQL] isin with empty list throws exception on cached DataFrame\n* [[SPARK-22223]](https://issues.apache.org/jira/browse/SPARK-22223)[SQL] ObjectHashAggregate should not introduce unnecessary shuffle\n* [[SPARK-21549]](https://issues.apache.org/jira/browse/SPARK-21549)[CORE] Respect OutputFormats with no/invalid output directory provided\n* [[SPARK-22273]](https://issues.apache.org/jira/browse/SPARK-22273)[SQL] Fix key/value schema field names in HashMapGenerators.\n* [[SPARK-14387]](https://issues.apache.org/jira/browse/SPARK-14387)[[SPARK-16628]](https://issues.apache.org/jira/browse/SPARK-16628)[[SPARK-18355]](https://issues.apache.org/jira/browse/SPARK-18355)[SQL] Use Spark schema to read ORC table instead of ORC file schema\n* [[SPARK-22252]](https://issues.apache.org/jira/browse/SPARK-22252)[SQL] FileFormatWriter should respect the input query schema\n* [[SPARK-22217]](https://issues.apache.org/jira/browse/SPARK-22217)[SQL] ParquetFileFormat to support arbitrary OutputCommitters\n* [[SPARK-21907]](https://issues.apache.org/jira/browse/SPARK-21907)[CORE] oom during spill\n* [[SPARK-22218]](https://issues.apache.org/jira/browse/SPARK-22218) spark shuffle services fails to update secret on app re-attempts\n* [[SPARK-21549]](https://issues.apache.org/jira/browse/SPARK-21549)[CORE] Respect OutputFormats with no output directory provided\n* [[SPARK-22445]](https://issues.apache.org/jira/browse/SPARK-22445)[SQL] move CodegenContext.copyResult to CodegenSupport\n* [[SPARK-22408]](https://issues.apache.org/jira/browse/SPARK-22408)[SQL] RelationalGroupedDataset\u2019s distinct pivot value calculation launches unnecessary stages\n* [[SPARK-17788]](https://issues.apache.org/jira/browse/SPARK-17788)[[SPARK-21033]](https://issues.apache.org/jira/browse/SPARK-21033)[SQL] fix the potential OOM in UnsafeExternalSorter and ShuffleExternalSorter\n* [[SPARK-22385]](https://issues.apache.org/jira/browse/SPARK-22385)[SQL] MapObjects should not access list element by index\n* [[SPARK-22355]](https://issues.apache.org/jira/browse/SPARK-22355)[SQL] Dataset.collect is not threadsafe\n* [[SPARK-21055]](https://issues.apache.org/jira/browse/SPARK-21055)[SQL] replace grouping\\_\\_id with grouping\\_id()\n* [[SPARK-21762]](https://issues.apache.org/jira/browse/SPARK-21762)[SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn\u2019t yet visible\n\n", "chunk_id": "f87ea3232042f3f500be654bf35db5ce", "url": "https://docs.databricks.com/archive/runtime-release-notes/3.5.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 3.5 LTS (unsupported)\n##### Maintenance updates\n\nSee [Databricks Runtime 3.5 maintenance updates](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#35).\n\n", "chunk_id": "c7fddc8fbf486d606cf7c68378a27ad1", "url": "https://docs.databricks.com/archive/runtime-release-notes/3.5.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 3.5 LTS (unsupported)\n##### System environment\n\n* **Operating System**: Ubuntu 16.04.3 LTS\n* **Java**: 1.8.0\\_151\n* **Scala**: 2.10.6 (Scala 2.10 cluster version)/2.11.8 (Scala 2.11 cluster version)\n* **Python**: 2.7.12 (or 3.5.2 if using Python 3)\n* **R**: R version 3.4.2 (2017-09-28)\n* **GPU clusters**: The following NVIDIA GPU libraries are installed:\n* Tesla driver 375.66\n* CUDA 8.0\n* CUDNN 6.0 \n### Pre-installed Python libraries \n| Library | Version | Library | Version | Library | Version |\n| --- | --- | --- | --- | --- | --- |\n| ansi2html | 1.1.1 | argparse | 1.2.1 | backports-abc | 0.5 |\n| boto | 2.42.0 | boto3 | 1.4.1 | botocore | 1.4.70 |\n| brewer2mpl | 1.4.1 | certifi | 2016.2.28 | cffi | 1.7.0 |\n| chardet | 2.3.0 | colorama | 0.3.7 | configobj | 5.0.6 |\n| cryptography | 1.5 | cycler | 0.10.0 | Cython | 0.24.1 |\n| decorator | 4.0.10 | docutils | 0.14 | enum34 | 1.1.6 |\n| et-xmlfile | 1.0.1 | freetype-py | 1.0.2 | funcsigs | 1.0.2 |\n| fusepy | 2.0.4 | futures | 3.1.1 | ggplot | 0.6.8 |\n| html5lib | 0.999 | idna | 2.1 | ipaddress | 1.0.16 |\n| ipython | 2.2.0 | ipython-genutils | 0.1.0 | jdcal | 1.2 |\n| Jinja2 | 2.8 | jmespath | 0.9.0 | llvmlite | 0.13.0 |\n| lxml | 3.6.4 | MarkupSafe | 0.23 | matplotlib | 1.5.3 |\n| mpld3 | 0.2 | msgpack-python | 0.4.7 | ndg-httpsclient | 0.3.3 |\n| numba | 0.28.1 | numpy | 1.11.1 | openpyxl | 2.3.2 |\n| pandas | 0.18.1 | pathlib2 | 2.1.0 | patsy | 0.4.1 |\n| pexpect | 4.0.1 | pickleshare | 0.7.4 | Pillow | 3.3.1 |\n| pip | 9.0.1 | ply | 3.9 | prompt-toolkit | 1.0.7 |\n| psycopg2 | 2.6.2 | ptyprocess | 0.5.1 | py4j | 0.10.3 |\n| pyarrow | 0.4.1 | pyasn1 | 0.1.9 | pycparser | 2.14 |\n| Pygments | 2.1.3 | PyGObject | 3.20.0 | pyOpenSSL | 16.0.0 |\n| pyparsing | 2.2.0 | pypng | 0.0.18 | Python | 2.7.12 |\n| python-dateutil | 2.5.3 | python-geohash | 0.8.5 | pytz | 2016.6.1 |\n| requests | 2.11.1 | s3transfer | 0.1.9 | scikit-learn | 0.18.1 |\n| scipy | 0.18.1 | scour | 0.32 | seaborn | 0.7.1 |\n| setuptools | 38.2.3 | simplejson | 3.8.2 | simples3 | 1.0 |\n| singledispatch | 3.4.0.3 | six | 1.10.0 | statsmodels | 0.6.1 |\n| tornado | 4.5.2 | traitlets | 4.3.0 | urllib3 | 1.19.1 |\n| virtualenv | 15.0.1 | wcwidth | 0.1.7 | wheel | 0.30.0 |\n| wsgiref | 0.1.2 | | | | | \n### Pre-installed R libraries \n| Library | Version | Library | Version | Library | Version |\n| --- | --- | --- | --- | --- | --- |\n| abind | 1.4-5 | assertthat | 0.2.0 | backports | 1.1.1 |\n| base | 3.4.2 | BH | 1.65.0-1 | bindr | 0.1 |\n| bindrcpp | 0.2 | bit | 1.1-12 | bit64 | 0.9-7 |\n| bitops | 1.0-6 | blob | 1.1.0 | boot | 1.3-20 |\n| brew | 1.0-6 | broom | 0.4.3 | car | 2.1-6 |\n| caret | 6.0-77 | chron | 2.3-51 | class | 7.3-14 |\n| cluster | 2.0.6 | codetools | 0.2-15 | colorspace | 1.3-2 |\n| commonmark | 1.4 | compiler | 3.4.2 | crayon | 1.3.4 |\n| curl | 3.0 | CVST | 0.2-1 | data.table | 1.10.4-3 |\n| datasets | 3.4.2 | DBI | 0.7 | ddalpha | 1.3.1 |\n| DEoptimR | 1.0-8 | desc | 1.1.1 | devtools | 1.13.4 |\n| dichromat | 2.0-0 | digest | 0.6.12 | dimRed | 0.1.0 |\n| doMC | 1.3.4 | dplyr | 0.7.4 | DRR | 0.0.2 |\n| foreach | 1.4.3 | foreign | 0.8-69 | gbm | 2.1.3 |\n| ggplot2 | 2.2.1 | git2r | 0.19.0 | glmnet | 2.0-13 |\n| glue | 1.2.0 | gower | 0.1.2 | graphics | 3.4.2 |\n| grDevices | 3.4.2 | grid | 3.4.2 | gsubfn | 0.6-6 |\n| gtable | 0.2.0 | h2o | 3.16.0.1 | httr | 1.3.1 |\n| hwriter | 1.3.2 | hwriterPlus | 1.0-3 | ipred | 0.9-6 |\n| iterators | 1.0.8 | jsonlite | 1.5 | kernlab | 0.9-25 |\n| KernSmooth | 2.23-15 | labeling | 0.3 | lattice | 0.20-35 |\n| lava | 1.5.1 | lazyeval | 0.2.1 | littler | 0.3.2 |\n| lme4 | 1.1-14 | lubridate | 1.7.1 | magrittr | 1.5 |\n| mapproj | 1.2-5 | maps | 3.2.0 | MASS | 7.3-47 |\n| Matrix | 1.2-11 | MatrixModels | 0.4-1 | memoise | 1.1.0 |\n| methods | 3.4.2 | mgcv | 1.8-22 | mime | 0.5 |\n| minqa | 1.2.4 | mnormt | 1.5-5 | ModelMetrics | 1.1.0 |\n| munsell | 0.4.3 | mvtnorm | 1.0-6 | nlme | 3.1-131 |\n| nloptr | 1.0.4 | nnet | 7.3-12 | numDeriv | 2016.8-1 |\n| openssl | 0.9.9 | parallel | 3.4.2 | pbkrtest | 0.4-7 |\n| pkgconfig | 2.0.1 | pkgKitten | 0.1.4 | plogr | 0.1-1 |\n| plyr | 1.8.4 | praise | 1.0.0 | pROC | 1.10.0 |\n| prodlim | 1.6.1 | proto | 1.0.0 | psych | 1.7.8 |\n| purrr | 0.2.4 | quantreg | 5.34 | R.methodsS3 | 1.7.1 |\n| R.oo | 1.21.0 | R.utils | 2.6.0 | R6 | 2.2.2 |\n| randomForest | 4.6-12 | RColorBrewer | 1.1-2 | Rcpp | 0.12.14 |\n| RcppEigen | 0.3.3.3.1 | RcppRoll | 0.2.2 | RCurl | 1.95-4.8 |\n| recipes | 0.1.1 | reshape2 | 1.4.2 | rlang | 0.1.4 |\n| robustbase | 0.92-8 | RODBC | 1.3-15 | roxygen2 | 6.0.1 |\n| rpart | 4.1-11 | rprojroot | 1.2 | Rserve | 1.7-3 |\n| RSQLite | 2.0 | rstudioapi | 0.7 | scales | 0.5.0 |\n| sfsmisc | 1.1-1 | sp | 1.2-5 | SparkR | 2.2.1 |\n| SparseM | 1.77 | spatial | 7.3-11 | splines | 3.4.2 |\n| sqldf | 0.4-11 | statmod | 1.4.30 | stats | 3.4.2 |\n| stats4 | 3.4.2 | stringi | 1.1.6 | stringr | 1.2.0 |\n| survival | 2.41-3 | tcltk | 3.4.2 | TeachingDemos | 2.10 |\n| testthat | 1.0.2 | tibble | 1.3.4 | tidyr | 0.7.2 |\n| tidyselect | 0.2.3 | timeDate | 3042.101 | tools | 3.4.2 |\n| utils | 3.4.2 | viridisLite | 0.2.0 | whisker | 0.3-2 |\n| withr | 2.1.0 | xml2 | 1.1.1 | | | \n### Pre-installed Java and Scala libraries (Scala 2.10 cluster version) \n| Group ID | Artifact ID | Version |\n| --- | --- | --- |\n| antlr | antlr | 2.7.7 |\n| com.amazonaws | amazon-kinesis-client | 1.7.3 |\n| com.amazonaws | aws-java-sdk-autoscaling | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudformation | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudfront | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudhsm | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudsearch | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudtrail | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudwatch | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudwatchmetrics | 1.11.126 |\n| com.amazonaws | aws-java-sdk-codedeploy | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cognitoidentity | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cognitosync | 1.11.126 |\n| com.amazonaws | aws-java-sdk-config | 1.11.126 |\n| com.amazonaws | aws-java-sdk-core | 1.11.126 |\n| com.amazonaws | aws-java-sdk-datapipeline | 1.11.126 |\n| com.amazonaws | aws-java-sdk-directconnect | 1.11.126 |\n| com.amazonaws | aws-java-sdk-directory | 1.11.126 |\n| com.amazonaws | aws-java-sdk-dynamodb | 1.11.126 |\n| com.amazonaws | aws-java-sdk-ec2 | 1.11.126 |\n| com.amazonaws | aws-java-sdk-ecs | 1.11.126 |\n| com.amazonaws | aws-java-sdk-efs | 1.11.126 |\n| com.amazonaws | aws-java-sdk-elasticache | 1.11.126 |\n| com.amazonaws | aws-java-sdk-elasticbeanstalk | 1.11.126 |\n| com.amazonaws | aws-java-sdk-elasticloadbalancing | 1.11.126 |\n| com.amazonaws | aws-java-sdk-elastictranscoder | 1.11.126 |\n| com.amazonaws | aws-java-sdk-emr | 1.11.126 |\n| com.amazonaws | aws-java-sdk-glacier | 1.11.126 |\n| com.amazonaws | aws-java-sdk-iam | 1.11.126 |\n| com.amazonaws | aws-java-sdk-importexport | 1.11.126 |\n| com.amazonaws | aws-java-sdk-kinesis | 1.11.126 |\n| com.amazonaws | aws-java-sdk-kms | 1.11.126 |\n| com.amazonaws | aws-java-sdk-lambda | 1.11.126 |\n| com.amazonaws | aws-java-sdk-logs | 1.11.126 |\n| com.amazonaws | aws-java-sdk-machinelearning | 1.11.126 |\n| com.amazonaws | aws-java-sdk-opsworks | 1.11.126 |\n| com.amazonaws | aws-java-sdk-rds | 1.11.126 |\n| com.amazonaws | aws-java-sdk-redshift | 1.11.126 |\n| com.amazonaws | aws-java-sdk-route53 | 1.11.126 |\n| com.amazonaws | aws-java-sdk-s3 | 1.11.126 |\n| com.amazonaws | aws-java-sdk-ses | 1.11.126 |\n| com.amazonaws | aws-java-sdk-simpledb | 1.11.126 |\n| com.amazonaws | aws-java-sdk-simpleworkflow | 1.11.126 |\n| com.amazonaws | aws-java-sdk-sns | 1.11.126 |\n| com.amazonaws | aws-java-sdk-sqs | 1.11.126 |\n| com.amazonaws | aws-java-sdk-ssm | 1.11.126 |\n| com.amazonaws | aws-java-sdk-storagegateway | 1.11.126 |\n| com.amazonaws | aws-java-sdk-sts | 1.11.126 |\n| com.amazonaws | aws-java-sdk-support | 1.11.126 |\n| com.amazonaws | aws-java-sdk-swf-libraries | 1.11.22 |\n| com.amazonaws | aws-java-sdk-workspaces | 1.11.126 |\n| com.amazonaws | jmespath-java | 1.11.126 |\n| com.carrotsearch | hppc | 0.7.1 |\n| com.chuusai | shapeless\\_2.10 | 2.3.2 |\n| com.clearspring.analytics | stream | 2.7.0 |\n| com.databricks | Rserve | 1.8-3 |\n| com.databricks | dbml-local\\_2.10 | 0.2.2-db2-spark2.2 |\n| com.databricks | dbml-local\\_2.10-tests | 0.2.2-db2-spark2.2 |\n| com.databricks | jets3t | 0.7.1-0 |\n| com.databricks.scalapb | compilerplugin\\_2.10 | 0.4.15-9 |\n| com.databricks.scalapb | scalapb-runtime\\_2.10 | 0.4.15-9 |\n| com.esotericsoftware | kryo-shaded | 3.0.3 |\n| com.esotericsoftware | minlog | 1.3.0 |\n| com.fasterxml | classmate | 1.0.0 |\n| com.fasterxml.jackson.core | jackson-annotations | 2.6.7 |\n| com.fasterxml.jackson.core | jackson-core | 2.6.7 |\n| com.fasterxml.jackson.core | jackson-databind | 2.6.7.1 |\n| com.fasterxml.jackson.dataformat | jackson-dataformat-cbor | 2.6.7 |\n| com.fasterxml.jackson.datatype | jackson-datatype-joda | 2.6.7 |\n| com.fasterxml.jackson.module | jackson-module-paranamer | 2.6.7 |\n| com.fasterxml.jackson.module | jackson-module-scala\\_2.10 | 2.6.7.1 |\n| com.github.fommil | jniloader | 1.1 |\n| com.github.fommil.netlib | core | 1.1.2 |\n| com.github.fommil.netlib | native\\_ref-java | 1.1 |\n| com.github.fommil.netlib | native\\_ref-java-natives | 1.1 |\n| com.github.fommil.netlib | native\\_system-java | 1.1 |\n| com.github.fommil.netlib | native\\_system-java-natives | 1.1 |\n| com.github.fommil.netlib | netlib-native\\_ref-linux-x86\\_64-natives | 1.1 |\n| com.github.fommil.netlib | netlib-native\\_system-linux-x86\\_64-natives | 1.1 |\n| com.github.rwl | jtransforms | 2.4.0 |\n| com.google.code.findbugs | jsr305 | 2.0.1 |\n| com.google.code.gson | gson | 2.2.4 |\n| com.google.guava | guava | 15.0 |\n| com.google.protobuf | protobuf-java | 2.6.1 |\n| com.googlecode.javaewah | JavaEWAH | 0.3.2 |\n| com.h2database | h2 | 1.3.174 |\n| com.jamesmurty.utils | java-xmlbuilder | 1.0 |\n| com.jcraft | jsch | 0.1.50 |\n| com.jolbox | bonecp | 0.8.0.RELEASE |\n| com.mchange | c3p0 | 0.9.5.1 |\n| com.mchange | mchange-commons-java | 0.2.10 |\n| com.microsoft.azure | azure-data-lake-store-sdk | 2.0.11 |\n| com.microsoft.sqlserver | mssql-jdbc | 6.1.0.jre8 |\n| com.ning | compress-lzf | 1.0.3 |\n| com.sun.mail | javax.mail | 1.5.2 |\n| com.thoughtworks.paranamer | paranamer | 2.6 |\n| com.trueaccord.lenses | lenses\\_2.10 | 0.3 |\n| com.twitter | chill-java | 0.8.0 |\n| com.twitter | chill\\_2.10 | 0.8.0 |\n| com.twitter | parquet-hadoop-bundle | 1.6.0 |\n| com.twitter | util-app\\_2.10 | 6.23.0 |\n| com.twitter | util-core\\_2.10 | 6.23.0 |\n| com.twitter | util-jvm\\_2.10 | 6.23.0 |\n| com.typesafe | config | 1.2.1 |\n| com.typesafe | scalalogging-slf4j\\_2.10 | 1.1.0 |\n| com.univocity | univocity-parsers | 2.2.1 |\n| com.vlkan | flatbuffers | 1.2.0-3f79e055 |\n| com.zaxxer | HikariCP | 2.4.1 |\n| commons-beanutils | commons-beanutils | 1.7.0 |\n| commons-beanutils | commons-beanutils-core | 1.8.0 |\n| commons-cli | commons-cli | 1.2 |\n| commons-codec | commons-codec | 1.10 |\n| commons-collections | commons-collections | 3.2.2 |\n| commons-configuration | commons-configuration | 1.6 |\n| commons-dbcp | commons-dbcp | 1.4 |\n| commons-digester | commons-digester | 1.8 |\n| commons-httpclient | commons-httpclient | 3.1 |\n| commons-io | commons-io | 2.4 |\n| commons-lang | commons-lang | 2.6 |\n| commons-logging | commons-logging | 1.1.3 |\n| commons-net | commons-net | 2.2 |\n| commons-pool | commons-pool | 1.5.4 |\n| info.ganglia.gmetric4j | gmetric4j | 1.0.7 |\n| io.dropwizard.metrics | metrics-core | 3.1.2 |\n| io.dropwizard.metrics | metrics-ganglia | 3.1.2 |\n| io.dropwizard.metrics | metrics-graphite | 3.1.2 |\n| io.dropwizard.metrics | metrics-healthchecks | 3.1.2 |\n| io.dropwizard.metrics | metrics-jetty9 | 3.1.2 |\n| io.dropwizard.metrics | metrics-json | 3.1.2 |\n| io.dropwizard.metrics | metrics-jvm | 3.1.2 |\n| io.dropwizard.metrics | metrics-log4j | 3.1.2 |\n| io.dropwizard.metrics | metrics-servlets | 3.1.2 |\n| io.netty | netty | 3.9.9.Final |\n| io.netty | netty-all | 4.0.43.Final |\n| io.prometheus | simpleclient | 0.0.16 |\n| io.prometheus | simpleclient\\_common | 0.0.16 |\n| io.prometheus | simpleclient\\_dropwizard | 0.0.16 |\n| io.prometheus | simpleclient\\_servlet | 0.0.16 |\n| io.prometheus.jmx | collector | 0.7 |\n| javax.activation | activation | 1.1.1 |\n| javax.annotation | javax.annotation-api | 1.2 |\n| javax.el | javax.el-api | 2.2.4 |\n| javax.jdo | jdo-api | 3.0.1 |\n| javax.servlet | javax.servlet-api | 3.1.0 |\n| javax.servlet.jsp | jsp-api | 2.1 |\n| javax.transaction | jta | 1.1 |\n| javax.validation | validation-api | 1.1.0.Final |\n| javax.ws.rs | javax.ws.rs-api | 2.0.1 |\n| javax.xml.bind | jaxb-api | 2.2.2 |\n| javax.xml.stream | stax-api | 1.0-2 |\n| javolution | javolution | 5.5.1 |\n| jline | jline | 2.11 |\n| joda-time | joda-time | 2.9.3 |\n| log4j | apache-log4j-extras | 1.2.17 |\n| log4j | log4j | 1.2.17 |\n| mx4j | mx4j | 3.0.2 |\n| net.hydromatic | eigenbase-properties | 1.1.5 |\n| net.iharder | base64 | 2.3.8 |\n| net.java.dev.jets3t | jets3t | 0.9.3 |\n| net.jpountz.lz4 | lz4 | 1.3.0 |\n| net.razorvine | pyrolite | 4.13 |\n| net.sf.jpam | jpam | 1.1 |\n| net.sf.opencsv | opencsv | 2.3 |\n| net.sf.supercsv | super-csv | 2.2.0 |\n| net.sourceforge.f2j | arpack\\_combined\\_all | 0.1 |\n| org.acplt | oncrpc | 1.0.7 |\n| org.antlr | ST4 | 4.0.4 |\n| org.antlr | antlr-runtime | 3.4 |\n| org.antlr | antlr4-runtime | 4.5.3 |\n| org.antlr | stringtemplate | 3.2.1 |\n| org.apache.ant | ant | 1.9.2 |\n| org.apache.ant | ant-jsch | 1.9.2 |\n| org.apache.ant | ant-launcher | 1.9.2 |\n| org.apache.arrow | arrow-format | 0.4.0 |\n| org.apache.arrow | arrow-memory | 0.4.0 |\n| org.apache.arrow | arrow-vector | 0.4.0 |\n| org.apache.avro | avro | 1.7.7 |\n| org.apache.avro | avro-ipc | 1.7.7 |\n| org.apache.avro | avro-ipc-tests | 1.7.7 |\n| org.apache.avro | avro-mapred-hadoop2 | 1.7.7 |\n| org.apache.calcite | calcite-avatica | 1.2.0-incubating |\n| org.apache.calcite | calcite-core | 1.2.0-incubating |\n| org.apache.calcite | calcite-linq4j | 1.2.0-incubating |\n| org.apache.commons | commons-compress | 1.4.1 |\n| org.apache.commons | commons-crypto | 1.0.0 |\n| org.apache.commons | commons-lang3 | 3.5 |\n| org.apache.commons | commons-math3 | 3.4.1 |\n| org.apache.curator | curator-client | 2.6.0 |\n| org.apache.curator | curator-framework | 2.6.0 |\n| org.apache.curator | curator-recipes | 2.6.0 |\n| org.apache.derby | derby | 10.10.2.0 |\n| org.apache.directory.api | api-asn1-api | 1.0.0-M20 |\n| org.apache.directory.api | api-util | 1.0.0-M20 |\n| org.apache.directory.server | apacheds-i18n | 2.0.0-M15 |\n| org.apache.directory.server | apacheds-kerberos-codec | 2.0.0-M15 |\n| org.apache.hadoop | hadoop-annotations | 2.7.3 |\n| org.apache.hadoop | hadoop-auth | 2.7.3 |\n| org.apache.hadoop | hadoop-client | 2.7.3 |\n| org.apache.hadoop | hadoop-common | 2.7.3 |\n| org.apache.hadoop | hadoop-hdfs | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-app | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-common | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-core | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-jobclient | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-shuffle | 2.7.3 |\n| org.apache.hadoop | hadoop-yarn-api | 2.7.3 |\n| org.apache.hadoop | hadoop-yarn-client | 2.7.3 |\n| org.apache.hadoop | hadoop-yarn-common | 2.7.3 |\n| org.apache.hadoop | hadoop-yarn-server-common | 2.7.3 |\n| org.apache.htrace | htrace-core | 3.1.0-incubating |\n| org.apache.httpcomponents | httpclient | 4.5.2 |\n| org.apache.httpcomponents | httpcore | 4.4.4 |\n| org.apache.ivy | ivy | 2.4.0 |\n| org.apache.parquet | parquet-column | 1.8.2-databricks1 |\n| org.apache.parquet | parquet-common | 1.8.2-databricks1 |\n| org.apache.parquet | parquet-encoding | 1.8.2-databricks1 |\n| org.apache.parquet | parquet-format | 2.3.1 |\n| org.apache.parquet | parquet-hadoop | 1.8.2-databricks1 |\n| org.apache.parquet | parquet-jackson | 1.8.2-databricks1 |\n| org.apache.thrift | libfb303 | 0.9.3 |\n| org.apache.thrift | libthrift | 0.9.3 |\n| org.apache.xbean | xbean-asm5-shaded | 4.4 |\n| org.apache.zookeeper | zookeeper | 3.4.6 |\n| org.bouncycastle | bcprov-jdk15on | 1.51 |\n| org.codehaus.jackson | jackson-core-asl | 1.9.13 |\n| org.codehaus.jackson | jackson-jaxrs | 1.9.13 |\n| org.codehaus.jackson | jackson-mapper-asl | 1.9.13 |\n| org.codehaus.jackson | jackson-xc | 1.9.13 |\n| org.codehaus.janino | commons-compiler | 3.0.0 |\n| org.codehaus.janino | janino | 3.0.0 |\n| org.datanucleus | datanucleus-api-jdo | 3.2.6 |\n| org.datanucleus | datanucleus-core | 3.2.10 |\n| org.datanucleus | datanucleus-rdbms | 3.2.9 |\n| org.eclipse.jetty | jetty-client | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-continuation | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-http | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-io | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-jndi | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-plus | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-proxy | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-security | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-server | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-servlet | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-servlets | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-util | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-webapp | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-xml | 9.3.11.v20160721 |\n| org.fusesource.jansi | jansi | 1.4 |\n| org.fusesource.leveldbjni | leveldbjni-all | 1.8 |\n| org.glassfish.hk2 | hk2-api | 2.4.0-b34 |\n| org.glassfish.hk2 | hk2-locator | 2.4.0-b34 |\n| org.glassfish.hk2 | hk2-utils | 2.4.0-b34 |\n| org.glassfish.hk2 | osgi-resource-locator | 1.0.1 |\n| org.glassfish.hk2.external | aopalliance-repackaged | 2.4.0-b34 |\n| org.glassfish.hk2.external | javax.inject | 2.4.0-b34 |\n| org.glassfish.jersey.bundles.repackaged | jersey-guava | 2.22.2 |\n| org.glassfish.jersey.containers | jersey-container-servlet | 2.22.2 |\n| org.glassfish.jersey.containers | jersey-container-servlet-core | 2.22.2 |\n| org.glassfish.jersey.core | jersey-client | 2.22.2 |\n| org.glassfish.jersey.core | jersey-common | 2.22.2 |\n| org.glassfish.jersey.core | jersey-server | 2.22.2 |\n| org.glassfish.jersey.media | jersey-media-jaxb | 2.22.2 |\n| org.hibernate | hibernate-validator | 5.1.1.Final |\n| org.iq80.snappy | snappy | 0.2 |\n| org.javassist | javassist | 3.18.1-GA |\n| org.jboss.logging | jboss-logging | 3.1.3.GA |\n| org.jdbi | jdbi | 2.63.1 |\n| org.joda | joda-convert | 1.7 |\n| org.jodd | jodd-core | 3.5.2 |\n| org.jpmml | pmml-model | 1.2.15 |\n| org.jpmml | pmml-schema | 1.2.15 |\n| org.json4s | json4s-ast\\_2.10 | 3.2.11 |\n| org.json4s | json4s-core\\_2.10 | 3.2.11 |\n| org.json4s | json4s-jackson\\_2.10 | 3.2.11 |\n| org.mariadb.jdbc | mariadb-java-client | 2.1.2 |\n| org.mockito | mockito-all | 1.9.5 |\n| org.objenesis | objenesis | 2.1 |\n| org.postgresql | postgresql | 9.4-1204-jdbc41 |\n| org.roaringbitmap | RoaringBitmap | 0.5.11 |\n| org.rocksdb | rocksdbjni | 5.2.1 |\n| org.rosuda.REngine | REngine | 2.1.0 |\n| org.scala-lang | jline | 2.10.6 |\n| org.scala-lang | scala-compiler\\_2.10 | 2.10.6 |\n| org.scala-lang | scala-library\\_2.10 | 2.10.6 |\n| org.scala-lang | scala-reflect\\_2.10 | 2.10.6 |\n| org.scala-lang | scalap\\_2.10 | 2.10.6 |\n| org.scala-sbt | test-interface | 1.0 |\n| org.scalacheck | scalacheck\\_2.10 | 1.12.5 |\n| org.scalamacros | quasiquotes\\_2.10 | 2.0.0 |\n| org.scalanlp | breeze-macros\\_2.10 | 0.13.2 |\n| org.scalanlp | breeze\\_2.10 | 0.13.2 |\n| org.scalatest | scalatest\\_2.10 | 2.2.6 |\n| org.slf4j | jcl-over-slf4j | 1.7.16 |\n| org.slf4j | jul-to-slf4j | 1.7.16 |\n| org.slf4j | slf4j-api | 1.7.16 |\n| org.slf4j | slf4j-log4j12 | 1.7.16 |\n| org.spark-project.hive | hive-beeline | 1.2.1.spark2 |\n| org.spark-project.hive | hive-cli | 1.2.1.spark2 |\n| org.spark-project.hive | hive-exec | 1.2.1.spark2 |\n| org.spark-project.hive | hive-jdbc | 1.2.1.spark2 |\n| org.spark-project.hive | hive-metastore | 1.2.1.spark2 |\n| org.spark-project.spark | unused | 1.0.0 |\n| org.spire-math | spire-macros\\_2.10 | 0.13.0 |\n| org.spire-math | spire\\_2.10 | 0.13.0 |\n| org.springframework | spring-core | 4.1.4.RELEASE |\n| org.springframework | spring-test | 4.1.4.RELEASE |\n| org.tukaani | xz | 1.0 |\n| org.typelevel | machinist\\_2.10 | 0.6.1 |\n| org.typelevel | macro-compat\\_2.10 | 1.1.1 |\n| org.xerial | sqlite-jdbc | 3.8.11.2 |\n| org.xerial.snappy | snappy-java | 1.1.2.6 |\n| org.yaml | snakeyaml | 1.16 |\n| oro | oro | 2.0.8 |\n| software.amazon.ion | ion-java | 1.0.2 |\n| stax | stax-api | 1.0.1 |\n| xmlenc | xmlenc | 0.52 | \n### Pre-installed Java and Scala libraries (Scala 2.11 cluster version) \n| Group ID | Artifact ID | Version |\n| --- | --- | --- |\n| antlr | antlr | 2.7.7 |\n| com.amazonaws | amazon-kinesis-client | 1.7.3 |\n| com.amazonaws | aws-java-sdk-autoscaling | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudformation | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudfront | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudhsm | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudsearch | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudtrail | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudwatch | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cloudwatchmetrics | 1.11.126 |\n| com.amazonaws | aws-java-sdk-codedeploy | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cognitoidentity | 1.11.126 |\n| com.amazonaws | aws-java-sdk-cognitosync | 1.11.126 |\n| com.amazonaws | aws-java-sdk-config | 1.11.126 |\n| com.amazonaws | aws-java-sdk-core | 1.11.126 |\n| com.amazonaws | aws-java-sdk-datapipeline | 1.11.126 |\n| com.amazonaws | aws-java-sdk-directconnect | 1.11.126 |\n| com.amazonaws | aws-java-sdk-directory | 1.11.126 |\n| com.amazonaws | aws-java-sdk-dynamodb | 1.11.126 |\n| com.amazonaws | aws-java-sdk-ec2 | 1.11.126 |\n| com.amazonaws | aws-java-sdk-ecs | 1.11.126 |\n| com.amazonaws | aws-java-sdk-efs | 1.11.126 |\n| com.amazonaws | aws-java-sdk-elasticache | 1.11.126 |\n| com.amazonaws | aws-java-sdk-elasticbeanstalk | 1.11.126 |\n| com.amazonaws | aws-java-sdk-elasticloadbalancing | 1.11.126 |\n| com.amazonaws | aws-java-sdk-elastictranscoder | 1.11.126 |\n| com.amazonaws | aws-java-sdk-emr | 1.11.126 |\n| com.amazonaws | aws-java-sdk-glacier | 1.11.126 |\n| com.amazonaws | aws-java-sdk-iam | 1.11.126 |\n| com.amazonaws | aws-java-sdk-importexport | 1.11.126 |\n| com.amazonaws | aws-java-sdk-kinesis | 1.11.126 |\n| com.amazonaws | aws-java-sdk-kms | 1.11.126 |\n| com.amazonaws | aws-java-sdk-lambda | 1.11.126 |\n| com.amazonaws | aws-java-sdk-logs | 1.11.126 |\n| com.amazonaws | aws-java-sdk-machinelearning | 1.11.126 |\n| com.amazonaws | aws-java-sdk-opsworks | 1.11.126 |\n| com.amazonaws | aws-java-sdk-rds | 1.11.126 |\n| com.amazonaws | aws-java-sdk-redshift | 1.11.126 |\n| com.amazonaws | aws-java-sdk-route53 | 1.11.126 |\n| com.amazonaws | aws-java-sdk-s3 | 1.11.126 |\n| com.amazonaws | aws-java-sdk-ses | 1.11.126 |\n| com.amazonaws | aws-java-sdk-simpledb | 1.11.126 |\n| com.amazonaws | aws-java-sdk-simpleworkflow | 1.11.126 |\n| com.amazonaws | aws-java-sdk-sns | 1.11.126 |\n| com.amazonaws | aws-java-sdk-sqs | 1.11.126 |\n| com.amazonaws | aws-java-sdk-ssm | 1.11.126 |\n| com.amazonaws | aws-java-sdk-storagegateway | 1.11.126 |\n| com.amazonaws | aws-java-sdk-sts | 1.11.126 |\n| com.amazonaws | aws-java-sdk-support | 1.11.126 |\n| com.amazonaws | aws-java-sdk-swf-libraries | 1.11.22 |\n| com.amazonaws | aws-java-sdk-workspaces | 1.11.126 |\n| com.amazonaws | jmespath-java | 1.11.126 |\n| com.carrotsearch | hppc | 0.7.1 |\n| com.chuusai | shapeless\\_2.11 | 2.3.2 |\n| com.clearspring.analytics | stream | 2.7.0 |\n| com.databricks | Rserve | 1.8-3 |\n| com.databricks | dbml-local\\_2.11 | 0.2.2-db2-spark2.2 |\n| com.databricks | dbml-local\\_2.11-tests | 0.2.2-db2-spark2.2 |\n| com.databricks | jets3t | 0.7.1-0 |\n| com.databricks.scalapb | compilerplugin\\_2.11 | 0.4.15-9 |\n| com.databricks.scalapb | scalapb-runtime\\_2.11 | 0.4.15-9 |\n| com.esotericsoftware | kryo-shaded | 3.0.3 |\n| com.esotericsoftware | minlog | 1.3.0 |\n| com.fasterxml | classmate | 1.0.0 |\n| com.fasterxml.jackson.core | jackson-annotations | 2.6.7 |\n| com.fasterxml.jackson.core | jackson-core | 2.6.7 |\n| com.fasterxml.jackson.core | jackson-databind | 2.6.7.1 |\n| com.fasterxml.jackson.dataformat | jackson-dataformat-cbor | 2.6.7 |\n| com.fasterxml.jackson.datatype | jackson-datatype-joda | 2.6.7 |\n| com.fasterxml.jackson.module | jackson-module-paranamer | 2.6.7 |\n| com.fasterxml.jackson.module | jackson-module-scala\\_2.11 | 2.6.7.1 |\n| com.github.fommil | jniloader | 1.1 |\n| com.github.fommil.netlib | core | 1.1.2 |\n| com.github.fommil.netlib | native\\_ref-java | 1.1 |\n| com.github.fommil.netlib | native\\_ref-java-natives | 1.1 |\n| com.github.fommil.netlib | native\\_system-java | 1.1 |\n| com.github.fommil.netlib | native\\_system-java-natives | 1.1 |\n| com.github.fommil.netlib | netlib-native\\_ref-linux-x86\\_64-natives | 1.1 |\n| com.github.fommil.netlib | netlib-native\\_system-linux-x86\\_64-natives | 1.1 |\n| com.github.rwl | jtransforms | 2.4.0 |\n| com.google.code.findbugs | jsr305 | 2.0.1 |\n| com.google.code.gson | gson | 2.2.4 |\n| com.google.guava | guava | 15.0 |\n| com.google.protobuf | protobuf-java | 2.6.1 |\n| com.googlecode.javaewah | JavaEWAH | 0.3.2 |\n| com.h2database | h2 | 1.3.174 |\n| com.jamesmurty.utils | java-xmlbuilder | 1.0 |\n| com.jcraft | jsch | 0.1.50 |\n| com.jolbox | bonecp | 0.8.0.RELEASE |\n| com.mchange | c3p0 | 0.9.5.1 |\n| com.mchange | mchange-commons-java | 0.2.10 |\n| com.microsoft.azure | azure-data-lake-store-sdk | 2.0.11 |\n| com.microsoft.sqlserver | mssql-jdbc | 6.1.0.jre8 |\n| com.ning | compress-lzf | 1.0.3 |\n| com.sun.mail | javax.mail | 1.5.2 |\n| com.thoughtworks.paranamer | paranamer | 2.6 |\n| com.trueaccord.lenses | lenses\\_2.11 | 0.3 |\n| com.twitter | chill-java | 0.8.0 |\n| com.twitter | chill\\_2.11 | 0.8.0 |\n| com.twitter | parquet-hadoop-bundle | 1.6.0 |\n| com.twitter | util-app\\_2.11 | 6.23.0 |\n| com.twitter | util-core\\_2.11 | 6.23.0 |\n| com.twitter | util-jvm\\_2.11 | 6.23.0 |\n| com.typesafe | config | 1.2.1 |\n| com.typesafe.scala-logging | scala-logging-api\\_2.11 | 2.1.2 |\n| com.typesafe.scala-logging | scala-logging-slf4j\\_2.11 | 2.1.2 |\n| com.univocity | univocity-parsers | 2.2.1 |\n| com.vlkan | flatbuffers | 1.2.0-3f79e055 |\n| com.zaxxer | HikariCP | 2.4.1 |\n| commons-beanutils | commons-beanutils | 1.7.0 |\n| commons-beanutils | commons-beanutils-core | 1.8.0 |\n| commons-cli | commons-cli | 1.2 |\n| commons-codec | commons-codec | 1.10 |\n| commons-collections | commons-collections | 3.2.2 |\n| commons-configuration | commons-configuration | 1.6 |\n| commons-dbcp | commons-dbcp | 1.4 |\n| commons-digester | commons-digester | 1.8 |\n| commons-httpclient | commons-httpclient | 3.1 |\n| commons-io | commons-io | 2.4 |\n| commons-lang | commons-lang | 2.6 |\n| commons-logging | commons-logging | 1.1.3 |\n| commons-net | commons-net | 2.2 |\n| commons-pool | commons-pool | 1.5.4 |\n| info.ganglia.gmetric4j | gmetric4j | 1.0.7 |\n| io.dropwizard.metrics | metrics-core | 3.1.2 |\n| io.dropwizard.metrics | metrics-ganglia | 3.1.2 |\n| io.dropwizard.metrics | metrics-graphite | 3.1.2 |\n| io.dropwizard.metrics | metrics-healthchecks | 3.1.2 |\n| io.dropwizard.metrics | metrics-jetty9 | 3.1.2 |\n| io.dropwizard.metrics | metrics-json | 3.1.2 |\n| io.dropwizard.metrics | metrics-jvm | 3.1.2 |\n| io.dropwizard.metrics | metrics-log4j | 3.1.2 |\n| io.dropwizard.metrics | metrics-servlets | 3.1.2 |\n| io.netty | netty | 3.9.9.Final |\n| io.netty | netty-all | 4.0.43.Final |\n| io.prometheus | simpleclient | 0.0.16 |\n| io.prometheus | simpleclient\\_common | 0.0.16 |\n| io.prometheus | simpleclient\\_dropwizard | 0.0.16 |\n| io.prometheus | simpleclient\\_servlet | 0.0.16 |\n| io.prometheus.jmx | collector | 0.7 |\n| javax.activation | activation | 1.1.1 |\n| javax.annotation | javax.annotation-api | 1.2 |\n| javax.el | javax.el-api | 2.2.4 |\n| javax.jdo | jdo-api | 3.0.1 |\n| javax.servlet | javax.servlet-api | 3.1.0 |\n| javax.servlet.jsp | jsp-api | 2.1 |\n| javax.transaction | jta | 1.1 |\n| javax.validation | validation-api | 1.1.0.Final |\n| javax.ws.rs | javax.ws.rs-api | 2.0.1 |\n| javax.xml.bind | jaxb-api | 2.2.2 |\n| javax.xml.stream | stax-api | 1.0-2 |\n| javolution | javolution | 5.5.1 |\n| jline | jline | 2.11 |\n| joda-time | joda-time | 2.9.3 |\n| log4j | apache-log4j-extras | 1.2.17 |\n| log4j | log4j | 1.2.17 |\n| mx4j | mx4j | 3.0.2 |\n| net.hydromatic | eigenbase-properties | 1.1.5 |\n| net.iharder | base64 | 2.3.8 |\n| net.java.dev.jets3t | jets3t | 0.9.3 |\n| net.jpountz.lz4 | lz4 | 1.3.0 |\n| net.razorvine | pyrolite | 4.13 |\n| net.sf.jpam | jpam | 1.1 |\n| net.sf.opencsv | opencsv | 2.3 |\n| net.sf.supercsv | super-csv | 2.2.0 |\n| net.sourceforge.f2j | arpack\\_combined\\_all | 0.1 |\n| org.acplt | oncrpc | 1.0.7 |\n| org.antlr | ST4 | 4.0.4 |\n| org.antlr | antlr-runtime | 3.4 |\n| org.antlr | antlr4-runtime | 4.5.3 |\n| org.antlr | stringtemplate | 3.2.1 |\n| org.apache.ant | ant | 1.9.2 |\n| org.apache.ant | ant-jsch | 1.9.2 |\n| org.apache.ant | ant-launcher | 1.9.2 |\n| org.apache.arrow | arrow-format | 0.4.0 |\n| org.apache.arrow | arrow-memory | 0.4.0 |\n| org.apache.arrow | arrow-vector | 0.4.0 |\n| org.apache.avro | avro | 1.7.7 |\n| org.apache.avro | avro-ipc | 1.7.7 |\n| org.apache.avro | avro-ipc-tests | 1.7.7 |\n| org.apache.avro | avro-mapred-hadoop2 | 1.7.7 |\n| org.apache.calcite | calcite-avatica | 1.2.0-incubating |\n| org.apache.calcite | calcite-core | 1.2.0-incubating |\n| org.apache.calcite | calcite-linq4j | 1.2.0-incubating |\n| org.apache.commons | commons-compress | 1.4.1 |\n| org.apache.commons | commons-crypto | 1.0.0 |\n| org.apache.commons | commons-lang3 | 3.5 |\n| org.apache.commons | commons-math3 | 3.4.1 |\n| org.apache.curator | curator-client | 2.6.0 |\n| org.apache.curator | curator-framework | 2.6.0 |\n| org.apache.curator | curator-recipes | 2.6.0 |\n| org.apache.derby | derby | 10.10.2.0 |\n| org.apache.directory.api | api-asn1-api | 1.0.0-M20 |\n| org.apache.directory.api | api-util | 1.0.0-M20 |\n| org.apache.directory.server | apacheds-i18n | 2.0.0-M15 |\n| org.apache.directory.server | apacheds-kerberos-codec | 2.0.0-M15 |\n| org.apache.hadoop | hadoop-annotations | 2.7.3 |\n| org.apache.hadoop | hadoop-auth | 2.7.3 |\n| org.apache.hadoop | hadoop-client | 2.7.3 |\n| org.apache.hadoop | hadoop-common | 2.7.3 |\n| org.apache.hadoop | hadoop-hdfs | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-app | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-common | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-core | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-jobclient | 2.7.3 |\n| org.apache.hadoop | hadoop-mapreduce-client-shuffle | 2.7.3 |\n| org.apache.hadoop | hadoop-yarn-api | 2.7.3 |\n| org.apache.hadoop | hadoop-yarn-client | 2.7.3 |\n| org.apache.hadoop | hadoop-yarn-common | 2.7.3 |\n| org.apache.hadoop | hadoop-yarn-server-common | 2.7.3 |\n| org.apache.htrace | htrace-core | 3.1.0-incubating |\n| org.apache.httpcomponents | httpclient | 4.5.2 |\n| org.apache.httpcomponents | httpcore | 4.4.4 |\n| org.apache.ivy | ivy | 2.4.0 |\n| org.apache.parquet | parquet-column | 1.8.2-databricks1 |\n| org.apache.parquet | parquet-common | 1.8.2-databricks1 |\n| org.apache.parquet | parquet-encoding | 1.8.2-databricks1 |\n| org.apache.parquet | parquet-format | 2.3.1 |\n| org.apache.parquet | parquet-hadoop | 1.8.2-databricks1 |\n| org.apache.parquet | parquet-jackson | 1.8.2-databricks1 |\n| org.apache.thrift | libfb303 | 0.9.3 |\n| org.apache.thrift | libthrift | 0.9.3 |\n| org.apache.xbean | xbean-asm5-shaded | 4.4 |\n| org.apache.zookeeper | zookeeper | 3.4.6 |\n| org.bouncycastle | bcprov-jdk15on | 1.51 |\n| org.codehaus.jackson | jackson-core-asl | 1.9.13 |\n| org.codehaus.jackson | jackson-jaxrs | 1.9.13 |\n| org.codehaus.jackson | jackson-mapper-asl | 1.9.13 |\n| org.codehaus.jackson | jackson-xc | 1.9.13 |\n| org.codehaus.janino | commons-compiler | 3.0.0 |\n| org.codehaus.janino | janino | 3.0.0 |\n| org.datanucleus | datanucleus-api-jdo | 3.2.6 |\n| org.datanucleus | datanucleus-core | 3.2.10 |\n| org.datanucleus | datanucleus-rdbms | 3.2.9 |\n| org.eclipse.jetty | jetty-client | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-continuation | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-http | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-io | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-jndi | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-plus | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-proxy | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-security | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-server | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-servlet | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-servlets | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-util | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-webapp | 9.3.11.v20160721 |\n| org.eclipse.jetty | jetty-xml | 9.3.11.v20160721 |\n| org.fusesource.leveldbjni | leveldbjni-all | 1.8 |\n| org.glassfish.hk2 | hk2-api | 2.4.0-b34 |\n| org.glassfish.hk2 | hk2-locator | 2.4.0-b34 |\n| org.glassfish.hk2 | hk2-utils | 2.4.0-b34 |\n| org.glassfish.hk2 | osgi-resource-locator | 1.0.1 |\n| org.glassfish.hk2.external | aopalliance-repackaged | 2.4.0-b34 |\n| org.glassfish.hk2.external | javax.inject | 2.4.0-b34 |\n| org.glassfish.jersey.bundles.repackaged | jersey-guava | 2.22.2 |\n| org.glassfish.jersey.containers | jersey-container-servlet | 2.22.2 |\n| org.glassfish.jersey.containers | jersey-container-servlet-core | 2.22.2 |\n| org.glassfish.jersey.core | jersey-client | 2.22.2 |\n| org.glassfish.jersey.core | jersey-common | 2.22.2 |\n| org.glassfish.jersey.core | jersey-server | 2.22.2 |\n| org.glassfish.jersey.media | jersey-media-jaxb | 2.22.2 |\n| org.hibernate | hibernate-validator | 5.1.1.Final |\n| org.iq80.snappy | snappy | 0.2 |\n| org.javassist | javassist | 3.18.1-GA |\n| org.jboss.logging | jboss-logging | 3.1.3.GA |\n| org.jdbi | jdbi | 2.63.1 |\n| org.joda | joda-convert | 1.7 |\n| org.jodd | jodd-core | 3.5.2 |\n| org.jpmml | pmml-model | 1.2.15 |\n| org.jpmml | pmml-schema | 1.2.15 |\n| org.json4s | json4s-ast\\_2.11 | 3.2.11 |\n| org.json4s | json4s-core\\_2.11 | 3.2.11 |\n| org.json4s | json4s-jackson\\_2.11 | 3.2.11 |\n| org.mariadb.jdbc | mariadb-java-client | 2.1.2 |\n| org.mockito | mockito-all | 1.9.5 |\n| org.objenesis | objenesis | 2.1 |\n| org.postgresql | postgresql | 9.4-1204-jdbc41 |\n| org.roaringbitmap | RoaringBitmap | 0.5.11 |\n| org.rocksdb | rocksdbjni | 5.2.1 |\n| org.rosuda.REngine | REngine | 2.1.0 |\n| org.scala-lang | scala-compiler\\_2.11 | 2.11.8 |\n| org.scala-lang | scala-library\\_2.11 | 2.11.8 |\n| org.scala-lang | scala-reflect\\_2.11 | 2.11.8 |\n| org.scala-lang | scalap\\_2.11 | 2.11.8 |\n| org.scala-lang.modules | scala-parser-combinators\\_2.11 | 1.0.2 |\n| org.scala-lang.modules | scala-xml\\_2.11 | 1.0.2 |\n| org.scala-sbt | test-interface | 1.0 |\n| org.scalacheck | scalacheck\\_2.11 | 1.12.5 |\n| org.scalanlp | breeze-macros\\_2.11 | 0.13.2 |\n| org.scalanlp | breeze\\_2.11 | 0.13.2 |\n| org.scalatest | scalatest\\_2.11 | 2.2.6 |\n| org.slf4j | jcl-over-slf4j | 1.7.16 |\n| org.slf4j | jul-to-slf4j | 1.7.16 |\n| org.slf4j | slf4j-api | 1.7.16 |\n| org.slf4j | slf4j-log4j12 | 1.7.16 |\n| org.spark-project.hive | hive-beeline | 1.2.1.spark2 |\n| org.spark-project.hive | hive-cli | 1.2.1.spark2 |\n| org.spark-project.hive | hive-exec | 1.2.1.spark2 |\n| org.spark-project.hive | hive-jdbc | 1.2.1.spark2 |\n| org.spark-project.hive | hive-metastore | 1.2.1.spark2 |\n| org.spark-project.spark | unused | 1.0.0 |\n| org.spire-math | spire-macros\\_2.11 | 0.13.0 |\n| org.spire-math | spire\\_2.11 | 0.13.0 |\n| org.springframework | spring-core | 4.1.4.RELEASE |\n| org.springframework | spring-test | 4.1.4.RELEASE |\n| org.tukaani | xz | 1.0 |\n| org.typelevel | machinist\\_2.11 | 0.6.1 |\n| org.typelevel | macro-compat\\_2.11 | 1.1.1 |\n| org.xerial | sqlite-jdbc | 3.8.11.2 |\n| org.xerial.snappy | snappy-java | 1.1.2.6 |\n| org.yaml | snakeyaml | 1.16 |\n| oro | oro | 2.0.8 |\n| software.amazon.ion | ion-java | 1.0.2 |\n| stax | stax-api | 1.0.1 |\n| xmlenc | xmlenc | 0.52 |\n\n", "chunk_id": "eebc56b15ec0229d009f8629453ba769", "url": "https://docs.databricks.com/archive/runtime-release-notes/3.5.html"} +{"chunked_text": "# \n### Sign up for Databricks Community edition\n\nThis article describes how to sign up for **Databricks Community Edition**. Unlike the [Databricks Free Trial](https://docs.databricks.com/getting-started/free-trial.html), Community Edition doesn\u2019t require that you have your own cloud account or supply cloud compute or storage resources. \nHowever, several features available in the Databricks Platform Free Trial, such as the [REST API](https://docs.databricks.com/api/workspace/introduction), are not available in Databricks Community Edition. For details, see [Databricks Community Edition FAQ](https://databricks.com/product/faq/community-edition). \nTo sign up for Databricks Community Edition: \n1. Click [Try Databricks](https://databricks.com/try-databricks) here or at the top of this page.\n2. Enter your name, company, email, and title, and click **Continue**.\n3. On the **Choose a cloud provider** dialog, click the **Get started with Community Edition** link. You\u2019ll see a page announcing that an email has been sent to the address you provided. \n![Try Databricks](https://docs.databricks.com/_images/try.png)\n4. Look for the welcome email and click the link to verify your email address. You are prompted to create your Databricks password.\n5. When you click **Submit**, you\u2019ll be taken to the Databricks Community Edition home page. \n![Community Edition landing page](https://docs.databricks.com/_images/landing-aws-ce.png)\n6. Run the [Get started: Query and visualize data from a notebook](https://docs.databricks.com/getting-started/quick-start.html) quickstart to familiarize yourself with Databricks.\n\n", "chunk_id": "05bd616655135789a90818e6eb001f2a", "url": "https://docs.databricks.com/getting-started/community-edition.html"} +{"chunked_text": "# \n### Sign up for Databricks Community edition\n#### Log back in to your Databricks account\n\nTo log back in to your Databricks Community Edition account, visit [community.cloud.databricks.com](https://community.cloud.databricks.com/login.html).\n\n", "chunk_id": "9388ce52c713c4e40a329bea02a405d9", "url": "https://docs.databricks.com/getting-started/community-edition.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n#### Work with features in Workspace Feature Store\n\nNote \nThis documentation covers the Workspace Feature Store. Databricks recommends using [Feature Engineering in Unity Catalog](https://docs.databricks.com/machine-learning/feature-store/uc/feature-tables-uc.html). Workspace Feature Store will be deprecated in the future. \nThis page describes how to create and work with feature tables in the Workspace Feature Store. \nNote \nIf your workspace is enabled for Unity Catalog, any table managed by Unity Catalog that has a primary key is automatically a feature table that you can use for model training and inference. All Unity Catalog capabilities, such as security, lineage, tagging, and cross-workspace access, are automatically available to the feature table. For information about working with feature tables in a Unity Catalog-enabled workspace, see [Feature Engineering in Unity Catalog](https://docs.databricks.com/machine-learning/feature-store/uc/feature-tables-uc.html). \nFor information about tracking feature lineage and freshness, see [Discover features and track feature lineage](https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/ui.html). \nNote \nDatabase and feature table names can contain only alphanumeric characters and underscores (\\_).\n\n", "chunk_id": "cb534df2c2ef0d745d418da8b6523bc4", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n#### Work with features in Workspace Feature Store\n##### Create a database for feature tables\n\nBefore creating any feature tables, you must create a [database](https://docs.databricks.com/lakehouse/data-objects.html#database) to store them. \n```\n%sql CREATE DATABASE IF NOT EXISTS \n\n``` \nFeature tables are stored as [Delta tables](https://docs.databricks.com/delta/index.html). When you create a feature table with `create_table` (Feature Store client v0.3.6 and above) or `create_feature_table` (v0.3.5 and below), you must specify the database name. For example, this argument creates a Delta table named `customer_features` in the database `recommender_system`. \n`name='recommender_system.customer_features'` \nWhen you publish a feature table to an online store, the default table and database name are the ones specified when you created the table; you can specify different names using the `publish_table` method. \nThe Databricks Feature Store UI shows the name of the table and database in the online store, along with other metadata.\n\n", "chunk_id": "399e7c2a73cf26702a04c2d19baadccb", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n#### Work with features in Workspace Feature Store\n##### Create a feature table in Databricks Feature Store\n\nNote \nYou can also register an existing [Delta table](https://docs.databricks.com/delta/index.html) as a feature table. See [Register an existing Delta table as a feature table](https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html#register-delta-table). \nThe basic steps to creating a feature table are: \n1. Write the Python functions to compute the features. The output of each function should be an Apache Spark DataFrame with a unique primary key. The primary key can consist of one or more columns.\n2. Create a feature table by instantiating a `FeatureStoreClient` and using `create_table` (v0.3.6 and above) or `create_feature_table` (v0.3.5 and below).\n3. Populate the feature table using `write_table`. \nFor details about the commands and parameters used in the following examples, see the [Feature Store Python API reference](https://api-docs.databricks.com/python/feature-store/latest/index.html). \n```\nfrom databricks.feature_store import feature_table\n\ndef compute_customer_features(data):\n''' Feature computation code returns a DataFrame with 'customer_id' as primary key'''\npass\n\n# create feature table keyed by customer_id\n# take schema from DataFrame output by compute_customer_features\nfrom databricks.feature_store import FeatureStoreClient\n\ncustomer_features_df = compute_customer_features(df)\n\nfs = FeatureStoreClient()\n\ncustomer_feature_table = fs.create_table(\nname='recommender_system.customer_features',\nprimary_keys='customer_id',\nschema=customer_features_df.schema,\ndescription='Customer features'\n)\n\n# An alternative is to use `create_table` and specify the `df` argument.\n# This code automatically saves the features to the underlying Delta table.\n\n# customer_feature_table = fs.create_table(\n# ...\n# df=customer_features_df,\n# ...\n# )\n\n# To use a composite key, pass all keys in the create_table call\n\n# customer_feature_table = fs.create_table(\n# ...\n# primary_keys=['customer_id', 'date'],\n# ...\n# )\n\n# Use write_table to write data to the feature table\n# Overwrite mode does a full refresh of the feature table\n\nfs.write_table(\nname='recommender_system.customer_features',\ndf = customer_features_df,\nmode = 'overwrite'\n)\n\n``` \n```\nfrom databricks.feature_store import feature_table\n\ndef compute_customer_features(data):\n''' Feature computation code returns a DataFrame with 'customer_id' as primary key'''\npass\n\n# create feature table keyed by customer_id\n# take schema from DataFrame output by compute_customer_features\nfrom databricks.feature_store import FeatureStoreClient\n\ncustomer_features_df = compute_customer_features(df)\n\nfs = FeatureStoreClient()\n\ncustomer_feature_table = fs.create_feature_table(\nname='recommender_system.customer_features',\nkeys='customer_id',\nschema=customer_features_df.schema,\ndescription='Customer features'\n)\n\n# An alternative is to use `create_feature_table` and specify the `features_df` argument.\n# This code automatically saves the features to the underlying Delta table.\n\n# customer_feature_table = fs.create_feature_table(\n# ...\n# features_df=customer_features_df,\n# ...\n# )\n\n# To use a composite key, pass all keys in the create_feature_table call\n\n# customer_feature_table = fs.create_feature_table(\n# ...\n# keys=['customer_id', 'date'],\n# ...\n# )\n\n# Use write_table to write data to the feature table\n# Overwrite mode does a full refresh of the feature table\n\nfs.write_table(\nname='recommender_system.customer_features',\ndf = customer_features_df,\nmode = 'overwrite'\n)from databricks.feature_store import feature_table\n\ndef compute_customer_features(data):\n''' Feature computation code returns a DataFrame with 'customer_id' as primary key'''\npass\n\n# create feature table keyed by customer_id\n# take schema from DataFrame output by compute_customer_features\nfrom databricks.feature_store import FeatureStoreClient\n\ncustomer_features_df = compute_customer_features(df)\n\nfs = FeatureStoreClient()\n\ncustomer_feature_table = fs.create_feature_table(\nname='recommender_system.customer_features',\nkeys='customer_id',\nschema=customer_features_df.schema,\ndescription='Customer features'\n)\n\n# An alternative is to use `create_feature_table` and specify the `features_df` argument.\n# This code automatically saves the features to the underlying Delta table.\n\n# customer_feature_table = fs.create_feature_table(\n# ...\n# features_df=customer_features_df,\n# ...\n# )\n\n# To use a composite key, pass all keys in the create_feature_table call\n\n# customer_feature_table = fs.create_feature_table(\n# ...\n# keys=['customer_id', 'date'],\n# ...\n# )\n\n# Use write_table to write data to the feature table\n# Overwrite mode does a full refresh of the feature table\n\nfs.write_table(\nname='recommender_system.customer_features',\ndf = customer_features_df,\nmode = 'overwrite'\n)\n\n```\n\n", "chunk_id": "9e46fd6738ce6ae29cbf09670b1c283d", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n#### Work with features in Workspace Feature Store\n##### Register an existing Delta table as a feature table\n\nWith v0.3.8 and above, you can register an existing [Delta table](https://docs.databricks.com/delta/index.html) as a feature table. The Delta table must exist in the metastore. \nNote \nTo [update a registered feature table](https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html#update-a-feature-table), you must use the [Feature Store Python API](https://docs.databricks.com/machine-learning/feature-store/python-api.html). \n```\nfs.register_table(\ndelta_table='recommender.customer_features',\nprimary_keys='customer_id',\ndescription='Customer features'\n)\n\n```\n\n#### Work with features in Workspace Feature Store\n##### Control access to feature tables\n\nSee [Control access to feature tables](https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/access-control.html).\n\n", "chunk_id": "69295e3ada1211009eb0b904de947ee2", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n#### Work with features in Workspace Feature Store\n##### Update a feature table\n\nYou can update a feature table by adding new features or by modifying specific rows based on the primary key. \nThe following feature table metadata cannot be updated: \n* Primary key\n* Partition key\n* Name or type of an existing feature \n### Add new features to an existing feature table \nYou can add new features to an existing feature table in one of two ways: \n* Update the existing feature computation function and run `write_table` with the returned DataFrame. This updates the feature table schema and merges new feature values based on the primary key.\n* Create a new feature computation function to calculate the new feature values. The DataFrame returned by this new computation function must contain the feature tables\u2019s primary keys and partition keys (if defined). Run `write_table` with the DataFrame to write the new features to the existing feature table, using the same primary key. \n### Update only specific rows in a feature table \nUse `mode = \"merge\"` in `write_table`. Rows whose primary key does not exist in the DataFrame sent in the `write_table` call remain unchanged. \n```\nfs.write_table(\nname='recommender.customer_features',\ndf = customer_features_df,\nmode = 'merge'\n)\n\n``` \n### Schedule a job to update a feature table \nTo ensure that features in feature tables always have the most recent values, Databricks recommends that you [create a job](https://docs.databricks.com/workflows/jobs/create-run-jobs.html) that runs a notebook to update your feature table on a regular basis, such as every day. If you already have a non-scheduled job created, you can convert it to a [scheduled job](https://docs.databricks.com/workflows/jobs/schedule-jobs.html#job-schedule) to make sure the feature values are always up-to-date. \nCode to update a feature table uses `mode='merge'`, as shown in the following example. \n```\nfs = FeatureStoreClient()\n\ncustomer_features_df = compute_customer_features(data)\n\nfs.write_table(\ndf=customer_features_df,\nname='recommender_system.customer_features',\nmode='merge'\n)\n\n``` \n### Store past values of daily features \nDefine a feature table with a composite primary key. Include the date in the primary key. For example, for a feature table `store_purchases`, you might use a composite primary key (`date`, `user_id`) and partition key `date` for efficient reads. \n```\nfs.create_table(\nname='recommender_system.customer_features',\nprimary_keys=['date', 'customer_id'],\npartition_columns=['date'],\nschema=customer_features_df.schema,\ndescription='Customer features'\n)\n\n``` \nYou can then create code to read from the feature table filtering `date` to the time period of interest. \nYou can also create a [time series feature table](https://docs.databricks.com/machine-learning/feature-store/time-series.html) by specifying the `date` column as a timestamp key using the `timestamp_keys` argument. \n```\nfs.create_table(\nname='recommender_system.customer_features',\nprimary_keys=['date', 'customer_id'],\ntimestamp_keys=['date'],\nschema=customer_features_df.schema,\ndescription='Customer timeseries features'\n)\n\n``` \nThis enables point-in-time lookups when you use `create_training_set` or `score_batch`. The system performs an as-of timestamp join, using the `timestamp_lookup_key` you specify. \nTo keep the feature table up to date, set up a regularly scheduled job to write features, or stream new feature values into the feature table. \n### Create a streaming feature computation pipeline to update features \nTo create a streaming feature computation pipeline, pass a streaming `DataFrame` as an argument to `write_table`. This method returns a `StreamingQuery` object. \n```\ndef compute_additional_customer_features(data):\n''' Returns Streaming DataFrame\n'''\npass # not shown\n\ncustomer_transactions = spark.readStream.load(\"dbfs:/events/customer_transactions\")\nstream_df = compute_additional_customer_features(customer_transactions)\n\nfs.write_table(\ndf=stream_df,\nname='recommender_system.customer_features',\nmode='merge'\n)\n\n```\n\n", "chunk_id": "3137c6aff45993d1ccdc889d18132fe7", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n#### Work with features in Workspace Feature Store\n##### Read from a feature table\n\nUse `read_table` to read feature values. \n```\nfs = feature_store.FeatureStoreClient()\ncustomer_features_df = fs.read_table(\nname='recommender.customer_features',\n)\n\n```\n\n#### Work with features in Workspace Feature Store\n##### Search and browse feature tables\n\nUse the Feature Store UI to search for or browse feature tables. \n1. In the sidebar, select **Machine Learning > Feature Store** to display the Feature Store UI.\n2. In the search box, enter all or part of the name of a feature table, a feature, or a data source used for feature computation. You can also enter all or part of the [key or value of a tag](https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html#work-with-feature-table-tags). Search text is case-insensitive. \n![Feature search example](https://docs.databricks.com/_images/feature-search-example.png)\n\n#### Work with features in Workspace Feature Store\n##### Get feature table metadata\n\nThe API to get feature table metadata depends on the Databricks runtime version you are using. With v0.3.6 and above, use `get_table`. With v0.3.5 and below, use `get_feature_table`. \n```\n# this example works with v0.3.6 and above\n# for v0.3.5, use `get_feature_table`\nfrom databricks.feature_store import FeatureStoreClient\nfs = FeatureStoreClient()\nfs.get_table(\"feature_store_example.user_feature_table\")\n\n```\n\n", "chunk_id": "daba3010389d5f7dc5546d135f27d77c", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n#### Work with features in Workspace Feature Store\n##### Work with feature table tags\n\nTags are key-value pairs that you can create and use to [search for feature tables](https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html#search-and-browse-feature-tables). You can create, edit, and delete tags using the Feature Store UI or the [Feature Store Python API](https://docs.databricks.com/machine-learning/feature-store/python-api.html). \n### Work with feature table tags in the UI \nUse the Feature Store UI to search for or browse feature tables. To access the UI, in the sidebar, select **Machine Learning > Feature Store**. \n#### Add a tag using the Feature Store UI \n1. Click ![Tag icon](https://docs.databricks.com/_images/tags1.png) if it is not already open. The tags table appears. \n![tag table](https://docs.databricks.com/_images/tags-open.png)\n2. Click in the **Name** and **Value** fields and enter the key and value for your tag.\n3. Click **Add**. \n#### Edit or delete a tag using the Feature Store UI \nTo edit or delete an existing tag, use the icons in the **Actions** column. \n![tag actions](https://docs.databricks.com/_images/tag-edit-or-delete.png) \n### Work with feature table tags using the Feature Store Python API \nOn clusters running v0.4.1 and above, you can create, edit, and delete tags using the [Feature Store Python API](https://docs.databricks.com/machine-learning/feature-store/python-api.html). \n#### Requirements \nFeature Store client v0.4.1 and above \n#### Create feature table with tag using the Feature Store Python API \n```\nfrom databricks.feature_store import FeatureStoreClient\nfs = FeatureStoreClient()\n\ncustomer_feature_table = fs.create_table(\n...\ntags={\"tag_key_1\": \"tag_value_1\", \"tag_key_2\": \"tag_value_2\", ...},\n...\n)\n\n``` \n#### Add, update, and delete tags using the Feature Store Python API \n```\nfrom databricks.feature_store import FeatureStoreClient\nfs = FeatureStoreClient()\n\n# Upsert a tag\nfs.set_feature_table_tag(table_name=\"my_table\", key=\"quality\", value=\"gold\")\n\n# Delete a tag\nfs.delete_feature_table_tag(table_name=\"my_table\", key=\"quality\")\n\n```\n\n", "chunk_id": "f6a3cc9bed50ebea47e9dd4ee6b7fd2c", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n#### Work with features in Workspace Feature Store\n##### Update data sources for a feature table\n\nFeature store automatically tracks the data sources used to compute features. You can also manually update the data sources by using the [Feature Store Python API](https://docs.databricks.com/machine-learning/feature-store/python-api.html). \n### Requirements \nFeature Store client v0.5.0 and above \n### Add data sources using the Feature Store Python API \nBelow are some example commands. For details, see [the API documentation](https://docs.databricks.com/machine-learning/feature-store/python-api.html). \n```\nfrom databricks.feature_store import FeatureStoreClient\nfs = FeatureStoreClient()\n\n# Use `source_type=\"table\"` to add a table in the metastore as data source.\nfs.add_data_sources(feature_table_name=\"clicks\", data_sources=\"user_info.clicks\", source_type=\"table\")\n\n# Use `source_type=\"path\"` to add a data source in path format.\nfs.add_data_sources(feature_table_name=\"user_metrics\", data_sources=\"dbfs:/FileStore/user_metrics.json\", source_type=\"path\")\n\n# Use `source_type=\"custom\"` if the source is not a table or a path.\nfs.add_data_sources(feature_table_name=\"user_metrics\", data_sources=\"user_metrics.txt\", source_type=\"custom\")\n\n``` \n### Delete data sources using the Feature Store Python API \nFor details, see [the API documentation](https://docs.databricks.com/machine-learning/feature-store/python-api.html). \nNote \nThe following command deletes data sources of all types (\u201ctable\u201d, \u201cpath\u201d, and \u201ccustom\u201d) that match the source names. \n```\nfrom databricks.feature_store import FeatureStoreClient\nfs = FeatureStoreClient()\nfs.delete_data_sources(feature_table_name=\"clicks\", sources_names=\"user_info.clicks\")\n\n```\n\n", "chunk_id": "9dc8febef6c3c813b7d78d96fa2b72b1", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n#### Work with features in Workspace Feature Store\n##### Delete a feature table\n\nYou can delete a feature table using the Feature Store UI or the [Feature Store Python API](https://docs.databricks.com/machine-learning/feature-store/python-api.html). \nNote \n* Deleting a feature table can lead to unexpected failures in upstream producers and downstream consumers (models, endpoints, and scheduled jobs). You must delete published online stores with your cloud provider.\n* When you delete a feature table using the API, the underlying Delta table is also dropped. When you delete a feature table from the UI, you must drop the underlying Delta table separately. \n### Delete a feature table using the UI \n1. On the feature table page, click ![Button Down](https://docs.databricks.com/_images/button-down.png) at the right of the feature table name and select **Delete**. If you do not have CAN MANAGE permission for the feature table, you will not see this option. \n![Select delete from drop-down menu](https://docs.databricks.com/_images/feature-store-deletion.png)\n2. In the Delete Feature Table dialog, click **Delete** to confirm.\n3. If you also want to [drop the underlying Delta table](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-drop-table.html), run the following command in a notebook. \n```\n%sql DROP TABLE IF EXISTS ;\n\n``` \n### Delete a feature table using the Feature Store Python API \nWith Feature Store client v0.4.1 and above, you can use `drop_table` to delete a feature table. When you delete a table with `drop_table`, the underlying Delta table is also dropped. \n```\nfs.drop_table(\nname='recommender_system.customer_features'\n)\n\n```\n\n", "chunk_id": "e50977d0f1639c2bfda97b1ec95471b0", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/feature-tables.html"} +{"chunked_text": "# Databricks administration introduction\n### Create and manage compute policies\n\nThis article explains how to create and manage policies in your workspace. For information on writing policy definitions, see [Compute policy reference](https://docs.databricks.com/admin/clusters/policy-definition.html). \nNote \nPolicies require the [Premium plan or above](https://databricks.com/product/pricing/platform-addons).\n\n### Create and manage compute policies\n#### What are compute policies?\n\nA policy is a tool workspace admins can use to limit a user or group\u2019s compute creation permissions based on a set of policy rules. \nPolicies provide the following benefits: \n* Limit users to creating clusters with prescribed settings.\n* Limit users to creating a certain number of clusters.\n* Simplify the user interface and enable more users to create their own clusters (by fixing and hiding some values).\n* Control cost by limiting per cluster maximum cost (by setting limits on attributes whose values contribute to hourly price).\n* Enforce cluster-scoped library installations.\n\n", "chunk_id": "0fc8150c289362d85d99cbf3ca4915d2", "url": "https://docs.databricks.com/admin/clusters/policies.html"} +{"chunked_text": "# Databricks administration introduction\n### Create and manage compute policies\n#### Create a policy\n\nThese are the basic instruction to create a policy. To learn how to define a policy, see [Compute policy reference](https://docs.databricks.com/admin/clusters/policy-definition.html). \n1. Click ![compute icon](https://docs.databricks.com/_images/clusters-icon.png) **Compute** in the sidebar.\n2. Click the **Policies** tab.\n3. Click **Create policy**.\n4. Name the policy. Policy names are case insensitive.\n5. Optionally, select a policy family from the **Family** dropdown. This determines the template from which you build the policy.\n6. Enter a **Description** of the policy. This helps others know the purpose of the policy.\n7. In the **Definitions** tab, enter a [policy definition](https://docs.databricks.com/admin/clusters/policy-definition.html).\n8. In the **Libraries** tab, add any compute-scoped libraries that you want the policy to install on the compute. See [Add libraries to a policy](https://docs.databricks.com/admin/clusters/policies.html#libraries).\n9. In the **Permissions** tab, assign permissions for the policy and optionally set the maximum number of resources a user can create using that policy.\n10. Click **Create**.\n\n### Create and manage compute policies\n#### Use a policy families\n\nWhen you create a policy, you can choose to use a policy family. Policy families are Databricks-provide policy templates with pre-populated rules, designed to address common compute use cases. \nWhen using a policy family, the rules for your policy are inherited from the policy family. After selecting a policy family, you can create the policy as-is, or choose to add rules or override the given rules. For more on policy families, see [Default policies and policy families](https://docs.databricks.com/admin/clusters/policy-families.html).\n\n", "chunk_id": "08add2d417759b315f2bd659dcbbfa71", "url": "https://docs.databricks.com/admin/clusters/policies.html"} +{"chunked_text": "# Databricks administration introduction\n### Create and manage compute policies\n#### Add libraries to a policy\n\nYou can add [libraries](https://docs.databricks.com/libraries/index.html) to a policy so libraries are automatically installed on compute resources. You can add a maximum of 500 libraries to a policy. \nNote \nYou may have previously added compute-scoped libraries using init scripts. Databricks recommends using compute policies instead of init scripts to install libraries. \nTo add a library to your policy: \n1. At the bottom of the **Create policy** page, click the **Libraries** tab.\n2. Click **Add library**.\n3. Select one of the **Library Source** options, then follow the instructions as outlined below: \n| Library source | Instructions |\n| --- | --- |\n| **Workspace** | Select a workspace file or upload a Whl, zipped wheelhouse, JAR, ZIP, tar, or requirements.txt file. See [Install libraries from workspace files](https://docs.databricks.com/libraries/workspace-files-libraries.html) |\n| **Volumes** | Select a Whl, JAR, or requirements.txt file from a volume. See [Install libraries from a volume](https://docs.databricks.com/libraries/volume-libraries.html). |\n| **File Path/S3** | Select the library type and provide the full URI to the library object (for example: `s3://bucket-name/path/to/library.whl`). See [Install libraries from object storage](https://docs.databricks.com/libraries/object-storage-libraries.html). |\n| **PyPI** | Enter a PyPI package name. See [PyPI package](https://docs.databricks.com/libraries/package-repositories.html#pypi-libraries). |\n| **Maven** | Specify a Maven coordinate. See [Maven or Spark package](https://docs.databricks.com/libraries/package-repositories.html#maven-libraries). |\n| **CRAN** | Enter the name of a package. See [CRAN package](https://docs.databricks.com/libraries/package-repositories.html#cran-libraries). |\n| **DBFS** (Not recommended) | Load a JAR or Whl file to the DBFS root. This is not recommended, as files stored in DBFS can be modified by any workspace user. |\n4. Click **Add**. \n### Effect of adding libraries to policies \nIf you add libraries to a policy: \n* Users can\u2019t install or uninstall compute-scoped libraries on compute that uses this policy.\n* Libraries configured through the UI, REST API, or CLI on existing compute are removed the next time the compute restarts.\n* Dependency libraries for tasks that use this policy in jobs compute resources are disabled.\n\n", "chunk_id": "ef94aece3f8f0e919f5c04b1c8bb367c", "url": "https://docs.databricks.com/admin/clusters/policies.html"} +{"chunked_text": "# Databricks administration introduction\n### Create and manage compute policies\n#### Policy permissions\n\nBy default, workspace admins have permissions on all policies. Non-admin users must be granted permissions on a policy for them to have access to the policy. \nIf a user has unrestricted cluster creation permissions, then they will also have access to the **Unrestricted** policy. This allows them to create fully configurable compute resources. \nIf a user doesn\u2019t have access to any policies, the policy dropdown does not display in their UI. \n### Restrict the number of compute resources per users \nPolicy permissions allow you to set a max number of compute resources per user. This determines how many resources a user can create using that policy. If the user exceeds the limit, the operation fails. \nTo restrict the number of resources a user can create using a policy, enter a value into the **Max compute resources per user** setting under the **Permissions** tab in the policies UI. \nNote \nDatabricks doesn\u2019t proactively terminate resources to maintain the limit. If a user has three compute resources running with the policy and the workspace admin reduces the limit to one, the three resources will continue to run. Extra resources must be manually terminated to comply with the limit.\n\n", "chunk_id": "01ab2a4044f8005e063a8c1832363eee", "url": "https://docs.databricks.com/admin/clusters/policies.html"} +{"chunked_text": "# Databricks administration introduction\n### Create and manage compute policies\n#### Manage a policy\n\nAfter you create a policy, you can edit, clone, and delete it. \nYou can also monitor the policy\u2019s adoption by viewing the compute resources that use the policy. From the **Policies** page, click the policy you want to view. Then click the **Compute** or **Jobs** tabs to see a list of resources that use the policy. \n### Edit a policy \nYou might want to edit a policy to update its permissions or its definitions. To edit a policy, select the policy from the Policies page then click **Edit**. After you click **Edit** you can click the **Permissions** tab to update the policy\u2019s permissions. You can also then update the policy\u2019s definition. \nNote \nAfter you update a policy\u2019s definitions, the compute that uses that policy does not automatically update to adhere to the new policy rules, but the policy rules will be in effect if the user attempts to edit the compute resource. \n### Clone a policy \nYou can also use the cloning feature to create a new policy from an existing policy. Open the policy you want to clone then click the **Clone** button. Then change any values of the fields that you want to modify and click **Create**. \n### Delete a policy \nSelect the policy from the Policies page then click **Delete**. When asked if you\u2019re sure you want to delete the policy, click **Delete** again. \nAny compute governed by a deleted policy can still run, but it cannot be edited unless the user has unrestricted cluster creation permissions.\n\n", "chunk_id": "4a4467945fd683276b6e1dfeec55e6ec", "url": "https://docs.databricks.com/admin/clusters/policies.html"} +{"chunked_text": "# Databricks data engineering\n## Work with files on Databricks\n#### What are workspace files?\n\nA workspace file is any file in the Databricks workspace that is not a Databricks notebook. Workspace files can be any file type. Common examples include: \n* `.py` files used in custom modules.\n* `.md` files, such as `README.md`.\n* `.csv` or other small data files.\n* `.txt` files.\n* `.whl` libraries.\n* Log files. \nWorkspace files include files formerly referred to as \u201cFiles in Repos\u201d. For recommendations on working with files, see [Recommendations for files in volumes and workspace files](https://docs.databricks.com/files/files-recommendations.html). \nImportant \nWorkspace files are enabled everywhere by default in Databricks Runtime version 11.2, but can be disabled by admins using the REST API. For production workloads, use Databricks Runtime 11.3 LTS or above. Contact your workspace administrator if you cannot access this functionality.\n\n", "chunk_id": "bc5f8c0feb1094e89917d87d378aaebe", "url": "https://docs.databricks.com/files/workspace.html"} +{"chunked_text": "# Databricks data engineering\n## Work with files on Databricks\n#### What are workspace files?\n##### What you can do with workspace files\n\nDatabricks provides functionality similar to local development for many workspace file types, including a built-in file editor. Not all use cases for all file types are supported. For example, while you can include images in an imported directory or repository, you cannot embed images in notebooks. \nYou can create, edit, and manage access to workspace files using familiar patterns from notebook interactions. You can use relative paths for library imports from workspace files, similar to local development. For more details, see: \n* [Workspace files basic usage](https://docs.databricks.com/files/workspace-basics.html)\n* [Programmatically interact with workspace files](https://docs.databricks.com/files/workspace-interact.html)\n* [Work with Python and R modules](https://docs.databricks.com/files/workspace-modules.html)\n* [Manage notebooks](https://docs.databricks.com/notebooks/notebooks-manage.html)\n* [File ACLs](https://docs.databricks.com/security/auth-authz/access-control/index.html#files) \nInit scripts stored in workspace files have special behavior. You can use workspace files to store and reference init scripts in any Databricks Runtime versions. See [Store init scripts in workspace files](https://docs.databricks.com/files/workspace-init-scripts.html). \nNote \nIn Databricks Runtime 14.0 and above, the the default current working directory (CWD) for code executed locally is the directory containing the notebook or script being run. This is a change in behavior from Databricks Runtime 13.3 LTS and below. See [What is the default current working directory?](https://docs.databricks.com/files/cwd-dbr-14.html).\n\n", "chunk_id": "fc2e27adb1845a535fabdef716cade0e", "url": "https://docs.databricks.com/files/workspace.html"} +{"chunked_text": "# Databricks data engineering\n## Work with files on Databricks\n#### What are workspace files?\n##### Limitations\n\nA complete list of workspace files limitations is found in [Workspace files limitations](https://docs.databricks.com/files/index.html#workspace-files-limitations). \n### File size limit \nIndividual workspace files are limited to 500 MB. \n### Databricks Runtime versions for files in Git folders with a cluster with Databricks Container Services \nOn clusters running Databricks Runtime 11.3 LTS and above, the default settings allow you to use workspace files in Git folders with Databricks Container Services (DCS). \nOn clusters running Databricks Runtime versions 10.4 LTS and 9.1 LTS, you must configure the dockerfile to access workspace files in Git folders on a cluster with DCS. Refer to the following dockerfiles for the desired Databricks Runtime version: \n* [Dockerfile for DBR 10.4 LTS](https://github.com/databricks/containers/tree/release-10.4-LTS/experimental/ubuntu/files-in-repos)\n* [Dockerfile for DBR 9.1 LTS](https://github.com/databricks/containers/tree/release-9.1-LTS/experimental/ubuntu/files-in-repos) \nSee [Customize containers with Databricks Container Service](https://docs.databricks.com/compute/custom-containers.html)\n\n", "chunk_id": "b4f00697d1397317a7ce82e74f8eb401", "url": "https://docs.databricks.com/files/workspace.html"} +{"chunked_text": "# Databricks data engineering\n## Work with files on Databricks\n#### What are workspace files?\n##### Enable workspace files\n\nTo enable support for non-notebook files in your Databricks workspace, call the [/api/2.0/workspace-conf](https://docs.databricks.com/api/workspace/workspaceconf/setstatus) REST API from a notebook or other environment with access to your Databricks workspace. Workspace files are **enabled** by default. \nTo enable or re-enable support for non-notebook files in your Databricks workspace, call the `/api/2.0/workspace-conf` and get the value of the `enableWorkspaceFileSystem` key. If it is set to `true`, non-notebook files are already enabled for your workspace. \nThe following example demonstrates how you can call this API from a notebook to check if workspace files are disabled and if so, re-enable them. \n### Example: Notebook for re-enabling Databricks workspace file support \n[Open notebook in new tab](https://docs.databricks.com/_extras/notebooks/source/files/turn-on-files.html)\n![Copy to clipboard](https://docs.databricks.com/_static/clippy.svg) Copy link for import\n\n", "chunk_id": "436caf4ef11dee379989377fff62f09d", "url": "https://docs.databricks.com/files/workspace.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### April 2024\n\nThese features and Databricks platform improvements were released in April 2024. \nNote \nReleases are staged. Your Databricks account might not be updated until a week or more after the initial release date.\n\n#### April 2024\n##### Databricks Runtime 15.1 is GA\n\n**April 30, 2024** \nDatabricks Runtime 15.1 and Databricks Runtime 15.1 ML are now generally available. \nSee [Databricks Runtime 15.1](https://docs.databricks.com/release-notes/runtime/15.1.html) and [Databricks Runtime 15.1 for Machine Learning](https://docs.databricks.com/release-notes/runtime/15.1ml.html).\n\n#### April 2024\n##### Databricks Assistant: Threads & history\n\n**April 29, 2024** \nDatabricks Assistant now offers an opt-in experience that tracks query threads and history throughout editor contexts in your session. You can continue in the same thread, or start a new one if you want. See [Enable or disable Databricks Assistant](https://docs.databricks.com/notebooks/databricks-assistant-faq.html#enable-or-disable) for how to enable this experience.\n\n#### April 2024\n##### Cancel pending serving endpoint updates in Model Serving\n\n**April 29, 2024** \nDatabricks Model Serving now supports cancelling in progress endpoint configuration updates from the Serving UI. See [Modify a custom model endpoint](https://docs.databricks.com/machine-learning/model-serving/create-manage-serving-endpoints.html#endpoint-config).\n\n", "chunk_id": "f34e6c35295d7bdf50c05be16ab2b695", "url": "https://docs.databricks.com/release-notes/product/2024/april.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### April 2024\n##### Data lineage now captures reads on tables with column masks and row-level security\n\n**April 25, 2024** \nData lineage now captures reads on tables that include column masks and row-level security. See [Filter sensitive table data using row filters and column masks](https://docs.databricks.com/data-governance/unity-catalog/row-and-column-filters.html) and [Capture and view data lineage using Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/data-lineage.html).\n\n#### April 2024\n##### Meta Llama 3 is supported in Model Serving for AWS\n\n**April 24, 2024** \nDatabricks Model Serving now supports Meta Llama 3, a dense model architecture built and trained by Meta. The Meta-Llama-3-70-Instruct model is available as part of Foundation Model APIs in pay-per-token serving endpoint regions. See [Use Foundation Model APIs](https://docs.databricks.com/machine-learning/foundation-models/index.html#use-foundation-apis).\n\n#### April 2024\n##### Notebooks now automatically detect SQL\n\n**April 24, 2024** \nDatabricks notebooks automatically add the `%sql` magic command when you enter SQL in a notebook set to a default language other than SQL. Previously, to prevent an error, you needed to manually add the `%sql` magic command.\n\n#### April 2024\n##### New columns added to the billable usage system table (Public Preview)\n\n**April 24, 2024** \nThe billable usage system table (`system.billing.usage`) now includes new columns that help you identify the specific product and features associated with the usage. See [Billable usage system table reference](https://docs.databricks.com/admin/system-tables/billing.html).\n\n", "chunk_id": "de2c603007c4d124694f8d9b98e7b30e", "url": "https://docs.databricks.com/release-notes/product/2024/april.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### April 2024\n##### Delta Sharing supports tables that use column mapping (Public Preview)\n\n**April 23, 2024** \nDelta Sharing now supports the sharing of tables that use [column mapping](https://docs.databricks.com/delta/delta-column-mapping.html). Recipients can read tables that use column mapping using a SQL warehouse, a cluster running Databricks Runtime 14.1 or above, or compute that is running open source `delta-sharing-spark` 3.1 or above. \nSee [Add tables with deletion vectors or column mapping to a share](https://docs.databricks.com/data-sharing/create-share.html#deletion-vectors), [Read tables with deletion vectors or column mapping enabled](https://docs.databricks.com/data-sharing/read-data-databricks.html#deletion-vectors), and [Read tables with deletion vectors or column mapping enabled](https://docs.databricks.com/data-sharing/read-data-databricks.html#deletion-vectors).\n\n#### April 2024\n##### Get serving endpoint schemas (Public Preview)\n\n**April 22, 2024** \nYou can now get the serving endpoint schema for your model serving or feature serving endpoints. This capability is available in Public Preview. See [Get a model serving endpoint schema](https://docs.databricks.com/machine-learning/model-serving/manage-serving-endpoints.html#get-schema).\n\n", "chunk_id": "63433f14a34d2f76ed7d750b5f1ef528", "url": "https://docs.databricks.com/release-notes/product/2024/april.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### April 2024\n##### Creation and installation of workspace libraries is no longer available\n\n**April 22, 2024** \n[Workspace libraries](https://docs.databricks.com/archive/legacy/workspace-libraries.html), which have been deprecated since November 2023, can no longer be created or installed on compute. Note that storing libraries as workspace files is distinct from workspace libraries and is still fully supported. You can install libraries stored as workspace files directly to compute or job tasks. See [Libraries](https://docs.databricks.com/libraries/index.html).\n\n#### April 2024\n##### Jobs created through the UI are now queued by default\n\n**April 18, 2024** \nQueueing of job runs is now automatically enabled when a job is created in the Databricks Jobs UI. When queueing is enabled, and a concurrency limit is reached, job runs are placed in a queue until capacity is available. See [What if my job cannot run because of concurrency limits?](https://docs.databricks.com/workflows/jobs/create-run-jobs.html#job-queueing).\n\n#### April 2024\n##### Configuring access to resources from serving endpoints is GA\n\n**April 17, 2024** \nYou can now configure environment variables to access resources outside of your feature serving and model serving endpoints. This capability is generally available. See [Configure access to resources from model serving endpoints](https://docs.databricks.com/machine-learning/model-serving/store-env-variable-model-serving.html).\n\n", "chunk_id": "9480aa7da6e01bb2074c857c2652fda3", "url": "https://docs.databricks.com/release-notes/product/2024/april.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### April 2024\n##### Serverless compute for workflows is in public preview\n\n**April 16, 2024** \nServerless compute for workflows allows you to run your Databricks job without configuring and deploying infrastructure. With serverless compute for workflows, Databricks efficiently manages the compute resources that run your job, including optimizing and scaling compute for your workloads. See [Run your Databricks job with serverless compute for workflows](https://docs.databricks.com/workflows/jobs/run-serverless-jobs.html).\n\n#### April 2024\n##### Lakehouse Federation supports foreign tables with case-sensitive identifiers\n\n**April 15, 2024** \nLakehouse Federation is now able to federate foreign tables with case-sensitive identifiers for MySQL, SQL Server, BigQuery, Snowflake, and Postgres connections. See [What is Lakehouse Federation](https://docs.databricks.com/query-federation/index.html).\n\n#### April 2024\n##### Compute cloning now clones any libraries installed on the original compute\n\n**April 11, 2024** \nWhen cloning compute, any libraries installed on the original compute will also be cloned. For cases where this behavior is unwanted, there is an alternative **Create without libraries** button on the compute clone page. See [Clone a compute](https://docs.databricks.com/compute/clusters-manage.html#cluster-clone).\n\n#### April 2024\n##### Route optimization is available for serving endpoints\n\n**April 10, 2024** \nYou can now create route optimized serving endpoints for your model serving or feature serving workflows. See [Configure route optimization on serving endpoints](https://docs.databricks.com/machine-learning/model-serving/route-optimization.html).\n\n", "chunk_id": "9a3ea94e7d28a13e00b8b50583306d80", "url": "https://docs.databricks.com/release-notes/product/2024/april.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### April 2024\n##### Delta Live Tables notebook developer experience improvements (Public Preview)\n\n**April 3, 2024** \nDatabricks has released a set of Public Preview features in Databricks notebooks that assist in the development and debugging of Delta Live Tables code. See [Develop and debug Delta Live Tables pipelines in notebooks](https://docs.databricks.com/delta-live-tables/dlt-notebook-devex.html).\n\n#### April 2024\n##### Databricks on AWS GovCloud (Public Preview)\n\n**April 1, 2024** \nAWS GovCloud gives United States government customers and their partners the flexibility to architect secure cloud solutions that comply with the FedRAMP High baseline and other compliance regimes. Databricks on AWS GovCloud provides the Databricks platform deployed in AWS GovCloud with compliance and security controls. See [Databricks on AWS GovCloud](https://docs.databricks.com/security/privacy/gov-cloud.html).\n\n", "chunk_id": "23fcdc9e5b444f0f959bceefef69cac5", "url": "https://docs.databricks.com/release-notes/product/2024/april.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### December 2022\n\nThese features and Databricks platform improvements were released in December 2022. \nNote \nReleases are staged. Your Databricks account might not be updated until a week or more after the initial release date.\n\n#### December 2022\n##### Databricks SQL Driver for Go is Generally Available\n\n**December 21, 2022** \nDatabricks now provides full support for the Databricks SQL Driver for Go. The new version number is 1.0.0. \nChanges include: \n* Direct results are now enabled. If `maxRows` is omitted when initializing the connection, it defaults to 100,000.\n* Context cancellation and timeout is now supported. If the underlying context of a query is cancelled or timed-out, the query will be cancelled on the server.\n* Client-side logging functionality allows users to set the verbosity of the logger and set various identifiers to track the lifetime of queries, connections, and requests.\n* `dbsql.NewConnector()` added to help initialize database connection.\n* Setting the initial namespace and session parameters including `timezone` is now supported.\n* Comprehensive usage examples, bug fixes, and performance improvements added. \nSee the full [changelog](https://github.com/databricks/databricks-sql-go/blob/main/CHANGELOG.md).\n\n#### December 2022\n##### Prevent concurrent workspace updates\n\n**December 15, 2022** \nA new feature prevents data corruption resulting from concurrent updates to a workspace. \nWhen an admin attempts to create, update, or delete a workspace through the account console or REST API, they receive an error message if there is already an ongoing operation on the same workspace. This prevents conflicting workspace updates and data corruption. The admin can wait and retry the operation after the other update finishes.\n\n", "chunk_id": "15db519d7d78edb0c672cdfb7f213635", "url": "https://docs.databricks.com/release-notes/product/2022/december.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### December 2022\n##### Databricks Terraform provider updated to version 1.7.0\n\n**December 14, 2022** \nVersion 1.7.0 adds the new data source `databricks_cluster_policy`, enables exporting of the `databricks_service_principal` and `databricks_sql_global_config` resources, and more. For details, see the changelogs for version [1.7.0](https://github.com/databricks/terraform-provider-databricks/blob/v1.7.0/CHANGELOG.md).\n\n#### December 2022\n##### Databricks Runtime 12.0 and 12.0 ML are GA\n\n**December 14, 2022** \nDatabricks Runtime 12.0 and 12.0 ML are now generally available. \nSee [Databricks Runtime 12.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/12.0.html) and [Databricks Runtime 12.0 for Machine Learning (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/12.0ml.html).\n\n#### December 2022\n##### Jobs are now available in global search\n\n**December 6-12, 2022** \nYou can now search for jobs in the top bar of your Databricks workspace.\n\n#### December 2022\n##### Billable usage graphs can now aggregate by individual tags\n\n**December 6, 2022** \nThe Databricks [billable usage graphs](https://docs.databricks.com/admin/account-settings/usage.html) in the account console can now aggregate usage by individual tags. The billable usage CSV reports downloaded from the same page also include default and custom tags.\n\n", "chunk_id": "8dc9bd67f3261ea20489d2cf1f939267", "url": "https://docs.databricks.com/release-notes/product/2022/december.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### December 2022\n##### Use SQL to specify schema- and catalog-level storage locations for Unity Catalog managed tables\n\n**December 6, 2022** \nYou can now use the `MANAGED LOCATION` SQL command to specify a cloud storage location for managed tables at the catalog and schema levels. Requires Databricks Runtime 11.3 and above. See [CREATE CATALOG](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-catalog.html) and [CREATE SCHEMA](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-schema.html).\n\n#### December 2022\n##### Capturing lineage data with Unity Catalog is now generally available\n\n**December 6, 2022** \nDatabricks is pleased to announce the general availability of Unity Catalog support for capturing and viewing lineage data. See [Capture and view data lineage using Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/data-lineage.html).\n\n#### December 2022\n##### Databricks ODBC driver 2.6.29\n\n**December 5, 2022** \nVersion 2.6.29 of the Databricks ODBC driver ([download](https://databricks.com/spark/odbc-drivers-download)) is now available. This release adds support for Unity catalog\u2019s primary and foreign keys, thereby enabling BI tools that use the ODBC driver to discover relationships between tables during data modeling. \nThis release also resolves the following issue: \n* The driver previously included cloud fetch\u2019s presigned URLs in its log file when `EnableCurlDebugLogging` is set. These presigned URLs included sensitive tokens. Driver log entries now exclude these tokens. \nFor full details, see `release-notes.txt` included with the driver.\n\n", "chunk_id": "b54ccebb036866a8050b5a9724a83159", "url": "https://docs.databricks.com/release-notes/product/2022/december.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### December 2022\n##### Databricks JDBC driver 2.6.32\n\n**December 5, 2022** \nVersion 2.6.32 of the Databricks JDBC driver ([download](https://databricks.com/spark/jdbc-drivers-download) and [Maven](https://search.maven.org/search?q=databricks-jdbc)) is now available. This release updates the Jackson JSON parser libraries for enhanced security: \n* jackson-annotations-2.13.4 (previously 2.13.2) \n+ jackson-core-2.13.4 (previously 2.13.2)\n+ jackson-databind-2.13.4.2 (previously 2.13.2.2) \nThis release also resolves the following issues: \n* When using cloud fetch, the driver now cleans up certain resources properly. \n+ The driver previously included cloud fetch\u2019s presigned URLs in its log file. These presigned URLs included sensitive tokens. These log entries now exclude these tokens.\n+ The driver logs a SQL statement whenever its execution hits an exception. Statements such as `COPY INTO` might contain secrets (for example `WITH CREDENTIAL`). Driver log entries now exclude such credentials. \nFor full details, see `release-notes.txt` included with the driver.\n\n#### December 2022\n##### Partner Connect supports connecting to AtScale\n\n**December 1-6, 2022** \nYou can now easily create a connection between AtScale and your Databricks workspace using Partner Connect. For more information, see [Connect to AtScale using Partner Connect](https://docs.databricks.com/partners/semantic-layer/atscale.html#partner-connect).\n\n", "chunk_id": "fec814da3db0a81a7be4e7f4192533f3", "url": "https://docs.databricks.com/release-notes/product/2022/december.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### December 2022\n##### Improved serverless SQL warehouse support for customer-managed keys\n\n**December 1, 2022** \n[Serverless SQL warehouses](https://docs.databricks.com/compute/sql-warehouse/index.html) now support adding customer-managed keys to your workspace for [managed services](https://docs.databricks.com/security/keys/customer-managed-keys.html#managed-services) and [your workspace\u2019s root S3 bucket](https://docs.databricks.com/security/keys/customer-managed-keys.html). \nServerless SQL warehouses do not use customer-managed keys for [EBS storage for compute nodes](https://docs.databricks.com/security/keys/customer-managed-keys.html), which is an optional part of the customer-managed keys for workspace storage feature. Disks for serverless compute resources are short-lived and tied to the lifecycle of the serverless workload. For example, when serverless SQL warehouses are stopped or scaled down, the VMs and their storage are destroyed. See [Serverless compute and customer-managed keys](https://docs.databricks.com/security/keys/customer-managed-keys.html#serverless).\n\n", "chunk_id": "5086de5cd02ada13ed78a892e03ecddc", "url": "https://docs.databricks.com/release-notes/product/2022/december.html"} +{"chunked_text": "# Security and compliance guide\n## Secret management\n#### Secret workflow example\n\nIn this workflow example, we use secrets to set up JDBC credentials for connecting to an Azure Data Lake Store.\n\n#### Secret workflow example\n##### Create a secret scope\n\nCreate a secret scope called `jdbc`. \n```\ndatabricks secrets create-scope jdbc\n\n``` \nNote \nIf your account does not have the [Premium plan or above](https://databricks.com/product/pricing/platform-addons), you must create the scope with MANAGE permission granted to all users (\u201cusers\u201d). For example: \n```\ndatabricks secrets create-scope jdbc --initial-manage-principal users\n\n```\n\n#### Secret workflow example\n##### Create secrets\n\nAdd the secrets `username` and `password`. Run the following commands and enter the secret values in the opened editor. \n```\ndatabricks secrets put-secret jdbc username\ndatabricks secrets put-secret jdbc password\n\n```\n\n#### Secret workflow example\n##### Use the secrets in a notebook\n\nIn a notebook, read the secrets that are stored in the secret scope `jdbc` to configure a JDBC connector: \n```\nval driverClass = \"com.microsoft.sqlserver.jdbc.SQLServerDriver\"\nval connectionProperties = new java.util.Properties()\nconnectionProperties.setProperty(\"Driver\", driverClass)\n\nval jdbcUsername = dbutils.secrets.get(scope = \"jdbc\", key = \"username\")\nval jdbcPassword = dbutils.secrets.get(scope = \"jdbc\", key = \"password\")\nconnectionProperties.put(\"user\", s\"${jdbcUsername}\")\nconnectionProperties.put(\"password\", s\"${jdbcPassword}\")\n\n``` \nYou can now use these `ConnectionProperties` with the JDBC connector to talk to your data source.\nThe values fetched from the scope are never displayed in the notebook (see [Secret redaction](https://docs.databricks.com/security/secrets/redaction.html)).\n\n", "chunk_id": "0e7fad214577d1c39e2b54f6e268993b", "url": "https://docs.databricks.com/security/secrets/example-secret-workflow.html"} +{"chunked_text": "# Security and compliance guide\n## Secret management\n#### Secret workflow example\n##### Grant access to another group\n\nNote \nThis step requires that your account have the [Premium plan or above](https://databricks.com/product/pricing/platform-addons). \nAfter verifying that the credentials were configured correctly, share these credentials with the `datascience` group to use for their analysis by granting them permissions to read the secret scope and list the available secrets . \nGrant the `datascience` group the READ permission to these credentials by making the following request: \n```\ndatabricks secrets put-acl jdbc datascience READ\n\n``` \nFor more information about secret access control, see [Secret ACLs](https://docs.databricks.com/security/auth-authz/access-control/index.html#secrets).\n\n", "chunk_id": "411acaf222168371fc2e28fbf3e5a43b", "url": "https://docs.databricks.com/security/secrets/example-secret-workflow.html"} +{"chunked_text": "# Technology partners\n## Connect to data prep partners using Partner Connect\n#### Connect to dbt Cloud\n\ndbt (data build tool) is a development environment that enables data analysts and data engineers to transform data by simply writing select statements. dbt handles turning these select statements into tables and views. dbt compiles your code into raw SQL and then runs that code on the specified database in Databricks. dbt supports collaborative coding patterns and best practices such as version control, documentation, and modularity. \ndbt does not extract or load data. dbt focuses on the transformation step only, using a \u201ctransform after load\u201d architecture. dbt assumes that you already have a copy of your data in your database. \nThis article focuses on dbt Cloud. dbt Cloud comes equipped with turnkey support for scheduling jobs, CI/CD, serving documentation, monitoring and alerting, and an integrated development environment (IDE). \nA local version of dbt called dbt Core is also available. dbt Core enables you to write dbt code in the text editor or IDE of your choice on your local development machine and then run dbt from the command line. dbt Core includes the dbt Command Line Interface (CLI). The dbt CLI is free to use and open source. For more information, see [Connect to dbt Core](https://docs.databricks.com/partners/prep/dbt.html). \nBecause dbt Cloud and dbt Core can use hosted git repositories (for example, on GitHub, GitLab or BitBucket), you can use dbt Cloud to create a dbt project and then make it available to your dbt Cloud and dbt Core users. For more information, see [Creating a dbt project](https://docs.getdbt.com/docs/building-a-dbt-project/projects#creating-a-dbt-project) and [Using an existing project](https://docs.getdbt.com/docs/building-a-dbt-project/projects#using-an-existing-project) on the dbt website. \nFor a general overview of dbt, watch the following YouTube video (26 minutes).\n\n", "chunk_id": "da27591937be9ebf4b5a1510c84fd51b", "url": "https://docs.databricks.com/partners/prep/dbt-cloud.html"} +{"chunked_text": "# Technology partners\n## Connect to data prep partners using Partner Connect\n#### Connect to dbt Cloud\n##### Connect to dbt Cloud using Partner Connect\n\nThis section describes how to connect a Databricks SQL warehouse to dbt Cloud using Partner Connect, then give dbt Cloud read access to your data. \n### Differences between standard connections and dbt Cloud \nTo connect to dbt Cloud using Partner Connect, you follow the steps in [Connect to data prep partners using Partner Connect](https://docs.databricks.com/partner-connect/prep.html). The dbt Cloud connection is different from standard data preparation and transformation connections in the following ways: \n* In addition to a service principal and a personal access token, Partner Connect creates a SQL warehouse (formerly SQL endpoint) named **DBT\\_CLOUD\\_ENDPOINT** by default. \n### Steps to connect \nTo connect to dbt Cloud using Partner Connect, do the following: \n1. [Connect to data prep partners using Partner Connect](https://docs.databricks.com/partner-connect/prep.html).\n2. After you connect to dbt Cloud, your dbt Cloud dashboard appears. To explore your dbt Cloud project, in the menu bar, next to the dbt logo, select your dbt account name from the first drop-down if it is not displayed, and then select the **Databricks Partner Connect Trial** project from the second drop-down menu if it is not displayed. \nTip \nTo view your project\u2019s settings, click the \u201cthree stripes\u201d or \u201chamburger\u201d menu, click **Account Settings > Projects**, and click the name of the project. To view the connection settings, click the link next to **Connection**. To change any settings, click **Edit**. \nTo view the Databricks personal access token information for this project, click the \u201cperson\u201d icon on the menu bar, click **Profile > Credentials > Databricks Partner Connect Trial**, and click the name of the project. To make a change, click **Edit**. \n### Steps to give dbt Cloud read access to your data \nPartner Connect gives create-only permission to the **DBT\\_CLOUD\\_USER** service principal only on the default catalog. Follow these steps in your Databricks workspace to give the **DBT\\_CLOUD\\_USER** service principal read access to the data that you choose. \nWarning \nYou can adapt these steps to give dbt Cloud additional access across catalogs, databases, and tables within your workspace. However, as a security best practice, Databricks strongly recommends that you give access only to the individual tables that you need the **DBT\\_CLOUD\\_USER** service principal to work with and only read access to those tables. \n1. Click ![Catalog icon](https://docs.databricks.com/_images/data-icon.png) **Catalog** in the sidebar.\n2. Select the SQL warehouse (**DBT\\_CLOUD\\_ENDPOINT**) in the drop-down list at the top right. \n![Select warehouse](https://docs.databricks.com/_images/select-endpoint.png) \n1. Under **Catalog Explorer**, select the catalog that contains the database for your table.\n2. Select the database that contains your table.\n3. Select your table.\nTip \nIf you do not see your catalog, database, or table listed, enter any portion of the name in the **Select Catalog**, **Select Database**, or **Filter tables** boxes, respectively, to narrow down the list. \n![Filter tables](https://docs.databricks.com/_images/filter-tables.png)\n3. Click **Permissions**.\n4. Click **Grant**.\n5. For **Type to add multiple users or groups**, select **DBT\\_CLOUD\\_USER**. This is the Databricks service principal that Partner Connect created for you in the previous section. \nTip \nIf you do not see **DBT\\_CLOUD\\_USER**, begin typing `DBT_CLOUD_USER` in the **Type to add multiple users or groups** box until it appears in the list, and then select it.\n6. Grant read access only by selecting `SELECT` and `READ METADATA`.\n7. Click **OK**. \nRepeat steps 4-9 for each additional table that you want to give dbt Cloud read access to. \n### Troubleshoot the dbt Cloud connection \nIf someone deletes the project in dbt Cloud for this account, and you the click the **dbt** tile, an error message appears, stating that the project cannot be found. To fix this, click **Delete connection**, and then start from the beginning of this procedure to create the connection again.\n\n", "chunk_id": "7e853e8141ea3034ef2091a11df6fc86", "url": "https://docs.databricks.com/partners/prep/dbt-cloud.html"} +{"chunked_text": "# Technology partners\n## Connect to data prep partners using Partner Connect\n#### Connect to dbt Cloud\n##### Connect to dbt Cloud manually\n\nThis section describes how to connect a Databricks cluster or a Databricks SQL warehouse in your Databricks workspace to dbt Cloud. \nImportant \nDatabricks recommends connecting to a SQL warehouse. If you don\u2019t have the Databricks SQL access entitlement, or if you want to run Python models, you can connect to a cluster instead. \n### Requirements \n* A cluster or SQL warehouse in your Databricks workspace. \n+ [Compute configuration reference](https://docs.databricks.com/compute/configure.html).\n+ [Create a SQL warehouse](https://docs.databricks.com/compute/sql-warehouse/create.html).\n* The connection details for your cluster or SQL warehouse, specifically the **Server Hostname**, **Port**, and **HTTP Path** values. \n+ [Get connection details for a Databricks compute resource](https://docs.databricks.com/integrations/compute-details.html).\n* A Databricks [personal access token](https://docs.databricks.com/dev-tools/auth/pat.html). To create a personal access token, do the following: \n1. In your Databricks workspace, click your Databricks username in the top bar, and then select **Settings** from the drop down.\n2. Click **Developer**.\n3. Next to **Access tokens**, click **Manage**.\n4. Click **Generate new token**.\n5. (Optional) Enter a comment that helps you to identify this token in the future, and change the token\u2019s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the **Lifetime (days)** box empty (blank).\n6. Click **Generate**.\n7. Copy the displayed token to a secure location, and then click **Done**.\nNote \nBe sure to save the copied token in a secure location. Do not share your copied token with others. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, or you believe that the token has been compromised, Databricks strongly recommends that you immediately delete that token from your workspace by clicking the trash can (**Revoke**) icon next to the token on the **Access tokens** page. \nIf you are not able to create or use tokens in your workspace, this might be because your workspace administrator has disabled tokens or has not given you permission to create or use tokens. See your workspace administrator or the following: \n+ [Enable or disable personal access token authentication for the workspace](https://docs.databricks.com/admin/access-control/tokens.html#enable-tokens)\n+ [Personal access token permissions](https://docs.databricks.com/security/auth-authz/api-access-permissions.html#pat) \nNote \nAs a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use [OAuth tokens](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html). \nIf you use personal access token authentication, Databricks recommends using personal access tokens belonging to [service principals](https://docs.databricks.com/admin/users-groups/service-principals.html) instead of workspace users. To create tokens for service principals, see [Manage tokens for a service principal](https://docs.databricks.com/admin/users-groups/service-principals.html#personal-access-tokens). \n* To connect dbt Cloud to data managed by Unity Catalog, dbt version 1.1 or above. \nThe steps in this article create a new environment that uses the latest dbt version. For information about upgrading the dbt version for an existing environment, see [Upgrading to the latest version of dbt in Cloud](https://docs.getdbt.com/docs/dbt-versions/upgrade-core-in-cloud#upgrading-to-the-latest-version-of-dbt-in-cloud) in the dbt documentation. \n### Step 1: Sign up for dbt Cloud \nGo to [dbt Cloud - Signup](https://www.getdbt.com/signup/) and enter your email, name, and company information. Create a password and click **Create my account**. \n### Step 2: Create a dbt project \nIn this step, you create a dbt *project*, which contains a connection to a Databricks cluster or a SQL warehouse, a repository that contains your source code, and one or more environments (such as testing and production environments). \n1. [Sign in to dbt Cloud](https://cloud.getdbt.com/login/).\n2. Click the settings icon, and then click **Account Settings**.\n3. Click **New Project**.\n4. For **Name**, enter a unique name for your project, and then click **Continue**.\n5. For **Choose a connection**, click **Databricks**, and then click **Next**.\n6. For **Name**, enter a unique name for this connection.\n7. For **Select Adapter**, click **Databricks (dbt-databricks)**. \nNote \nDatabricks recommends using `dbt-databricks`, which supports Unity Catalog, instead of `dbt-spark`. By default, new projects use `dbt-databricks`. To migrate an existing project to `dbt-databricks`, see [Migrating from dbt-spark to dbt-databricks](https://docs.getdbt.com/guides/migration/tools/migrating-from-spark-to-databricks) in the dbt documentation.\n8. Under **Settings**, for **Server Hostname**, enter the server hostname value from the requirements.\n9. For **HTTP Path**, enter the HTTP path value from the requirements.\n10. If your workspace is Unity Catalog-enabled, under **Optional Settings**, enter the name of the catalog for dbt Cloud to use.\n11. Under **Development Credentials**, for **Token**, enter the personal access token from the requirements.\n12. For **Schema**, enter the name of the schema where you want dbt Cloud to create the tables and views (for example, `default`).\n13. Click **Test Connection**.\n14. If the test succeeds, click **Next**. \nFor more information, see [Connecting to Databricks ODBC](https://docs.getdbt.com/docs/dbt-cloud/cloud-configuring-dbt-cloud/connecting-your-database#connecting-to-databricks) on the dbt website. \nTip \nTo view or change the settings for this project, or to delete the project altogether, click the settings icon, click **Account Settings > Projects**, and click the name of the project. To change the settings, click **Edit**. To delete the project, click **Edit > Delete Project**. \nTo view or change your Databricks personal access token value for this project, click the \u201cperson\u201d icon, click **Profile > Credentials**, and click the name of the project. To make a change, click **Edit**. \nAfter you connect to a Databricks cluster or a Databricks SQL warehouse, follow the on-screen instructions to **Setup a Repository**, and then click **Continue**. \nAfter you set up the repository, follow the on-screen instructions to invite users and then click **Complete**. Or click **Skip & Complete**.\n\n", "chunk_id": "d86ab09cd49d1d5104a231a801b41240", "url": "https://docs.databricks.com/partners/prep/dbt-cloud.html"} +{"chunked_text": "# Technology partners\n## Connect to data prep partners using Partner Connect\n#### Connect to dbt Cloud\n##### Tutorial\n\nIn this section, you use your dbt Cloud project to work with some sample data. This section assumes that you have already created your project and have the dbt Cloud IDE open to that project. \n### Step 1: Create and run models \nIn this step, you use the dbt Cloud IDE to create and run *models*, which are `select` statements that create either a new view (the default) or a new table in a database, based on existing data in that same database. This procedure creates a model based on the sample `diamonds` table from the [Sample datasets](https://docs.databricks.com/discover/databricks-datasets.html). \nUse the following code to create this table. \n```\nDROP TABLE IF EXISTS diamonds;\n\nCREATE TABLE diamonds USING CSV OPTIONS (path \"/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv\", header \"true\")\n\n``` \nThis procedure assumes this table has already been created in your workspace\u2019s `default` database. \n1. With the project open, click **Develop** at the top of the UI.\n2. Click **Initialize dbt project**.\n3. Click **Commit and sync**, enter a commit message, and then click **Commit**.\n4. Click **Create branch**, enter a name for your branch, and then click **Submit**.\n5. Create the first model: Click **Create New File**.\n6. In the text editor, enter the following SQL statement. This statement selects only the carat, cut, color, and clarity details for each diamond from the `diamonds` table. The `config` block instructs dbt to create a table in the database based on this statement. \n```\n{{ config(\nmaterialized='table',\nfile_format='delta'\n) }}\n\n``` \n```\nselect carat, cut, color, clarity\nfrom diamonds\n\n``` \nTip \nFor additional `config` options such as the `merge` incremental strategy, see [Databricks configurations](https://docs.getdbt.com/reference/resource-configs/databricks-configs) in the dbt documentation.\n7. Click **Save As**.\n8. For the filename, enter `models/diamonds_four_cs.sql` and then click **Create**.\n9. Create a second model: Click ![Create New File icon](https://docs.databricks.com/_images/dbt-cloud-create-new-file.png) (**Create New File**) in the upper-right corner.\n10. In the text editor, enter the following SQL statement. This statement selects unique values from the `colors` column in the `diamonds_four_cs` table, sorting the results in alphabetical order first to last. Because there is no `config` block, this model instructs dbt to create a view in the database based on this statement. \n```\nselect distinct color\nfrom diamonds_four_cs\nsort by color asc\n\n```\n11. Click **Save As**.\n12. For the filename, enter `models/diamonds_list_colors.sql`, and then click **Create**.\n13. Create a third model: Click ![Create New File icon](https://docs.databricks.com/_images/dbt-cloud-create-new-file.png) (**Create New File**) in the upper-right corner.\n14. In the text editor, enter the following SQL statement. This statement averages diamond prices by color, sorting the results by average price from highest to lowest. This model instructs dbt to create a view in the database based on this statement. \n```\nselect color, avg(price) as price\nfrom diamonds\ngroup by color\norder by price desc\n\n```\n15. Click **Save As**.\n16. For the filename, enter `models/diamonds_prices.sql` and click **Create**.\n17. Run the models: In the command line, run the `dbt run` command with the paths to the three preceding files. In the `default` database, dbt creates one table named `diamonds_four_cs` and two views named `diamonds_list_colors` and `diamonds_prices`. dbt gets these view and table names from their related `.sql` file names. \n```\ndbt run --model models/diamonds_four_cs.sql models/diamonds_list_colors.sql models/diamonds_prices.sql\n\n``` \n```\n...\n... | 1 of 3 START table model default.diamonds_four_cs.................... [RUN]\n... | 1 of 3 OK created table model default.diamonds_four_cs............... [OK ...]\n... | 2 of 3 START view model default.diamonds_list_colors................. [RUN]\n... | 2 of 3 OK created view model default.diamonds_list_colors............ [OK ...]\n... | 3 of 3 START view model default.diamonds_prices...................... [RUN]\n... | 3 of 3 OK created view model default.diamonds_prices................. [OK ...]\n... |\n... | Finished running 1 table model, 2 view models ...\n\nCompleted successfully\n\nDone. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3\n\n```\n18. Run the following SQL code to list information about the new views and to select all rows from the table and views. \nIf you are connecting to a cluster, you can run this SQL code from a [notebook](https://docs.databricks.com/notebooks/notebooks-manage.html#create-a-notebook) that is attached to the cluster, specifying SQL as the default language for the notebook. If you are connecting to a SQL warehouse, you can run this SQL code from a [query](https://docs.databricks.com/sql/user/sql-editor/index.html#create-a-query). \n```\nSHOW views IN default\n\n``` \n```\n+-----------+----------------------+-------------+\n| namespace | viewName | isTemporary |\n+===========+======================+=============+\n| default | diamonds_list_colors | false |\n+-----------+----------------------+-------------+\n| default | diamonds_prices | false |\n+-----------+----------------------+-------------+\n\n``` \n```\nSELECT * FROM diamonds_four_cs\n\n``` \n```\n+-------+---------+-------+---------+\n| carat | cut | color | clarity |\n+=======+=========+=======+=========+\n| 0.23 | Ideal | E | SI2 |\n+-------+---------+-------+---------+\n| 0.21 | Premium | E | SI1 |\n+-------+---------+-------+---------+\n...\n\n``` \n```\nSELECT * FROM diamonds_list_colors\n\n``` \n```\n+-------+\n| color |\n+=======+\n| D |\n+-------+\n| E |\n+-------+\n...\n\n``` \n```\nSELECT * FROM diamonds_prices\n\n``` \n```\n+-------+---------+\n| color | price |\n+=======+=========+\n| J | 5323.82 |\n+-------+---------+\n| I | 5091.87 |\n+-------+---------+\n...\n\n``` \n### Step 2: Create and run more complex models \nIn this step, you create more complex models for a set of related data tables. These data tables contain information about a fictional sports league of three teams playing a season of six games. This procedure creates the data tables, creates the models, and runs the models. \n1. Run the following SQL code to create the necessary data tables. \nIf you are connecting to a cluster, you can run this SQL code from a [notebook](https://docs.databricks.com/notebooks/notebooks-manage.html#create-a-notebook) that is attached to the cluster, specifying SQL as the default language for the notebook. If you are connecting to a SQL warehouse, you can run this SQL code from a [query](https://docs.databricks.com/sql/user/sql-editor/index.html#create-a-query). \nThe tables and views in this step start with `zzz_` to help identify them as part of this example. You do not need to follow this pattern for your own tables and views. \n```\nDROP TABLE IF EXISTS zzz_game_opponents;\nDROP TABLE IF EXISTS zzz_game_scores;\nDROP TABLE IF EXISTS zzz_games;\nDROP TABLE IF EXISTS zzz_teams;\n\nCREATE TABLE zzz_game_opponents (\ngame_id INT,\nhome_team_id INT,\nvisitor_team_id INT\n) USING DELTA;\n\nINSERT INTO zzz_game_opponents VALUES (1, 1, 2);\nINSERT INTO zzz_game_opponents VALUES (2, 1, 3);\nINSERT INTO zzz_game_opponents VALUES (3, 2, 1);\nINSERT INTO zzz_game_opponents VALUES (4, 2, 3);\nINSERT INTO zzz_game_opponents VALUES (5, 3, 1);\nINSERT INTO zzz_game_opponents VALUES (6, 3, 2);\n\n-- Result:\n-- +---------+--------------+-----------------+\n-- | game_id | home_team_id | visitor_team_id |\n-- +=========+==============+=================+\n-- | 1 | 1 | 2 |\n-- +---------+--------------+-----------------+\n-- | 2 | 1 | 3 |\n-- +---------+--------------+-----------------+\n-- | 3 | 2 | 1 |\n-- +---------+--------------+-----------------+\n-- | 4 | 2 | 3 |\n-- +---------+--------------+-----------------+\n-- | 5 | 3 | 1 |\n-- +---------+--------------+-----------------+\n-- | 6 | 3 | 2 |\n-- +---------+--------------+-----------------+\n\nCREATE TABLE zzz_game_scores (\ngame_id INT,\nhome_team_score INT,\nvisitor_team_score INT\n) USING DELTA;\n\nINSERT INTO zzz_game_scores VALUES (1, 4, 2);\nINSERT INTO zzz_game_scores VALUES (2, 0, 1);\nINSERT INTO zzz_game_scores VALUES (3, 1, 2);\nINSERT INTO zzz_game_scores VALUES (4, 3, 2);\nINSERT INTO zzz_game_scores VALUES (5, 3, 0);\nINSERT INTO zzz_game_scores VALUES (6, 3, 1);\n\n-- Result:\n-- +---------+-----------------+--------------------+\n-- | game_id | home_team_score | visitor_team_score |\n-- +=========+=================+====================+\n-- | 1 | 4 | 2 |\n-- +---------+-----------------+--------------------+\n-- | 2 | 0 | 1 |\n-- +---------+-----------------+--------------------+\n-- | 3 | 1 | 2 |\n-- +---------+-----------------+--------------------+\n-- | 4 | 3 | 2 |\n-- +---------+-----------------+--------------------+\n-- | 5 | 3 | 0 |\n-- +---------+-----------------+--------------------+\n-- | 6 | 3 | 1 |\n-- +---------+-----------------+--------------------+\n\nCREATE TABLE zzz_games (\ngame_id INT,\ngame_date DATE\n) USING DELTA;\n\nINSERT INTO zzz_games VALUES (1, '2020-12-12');\nINSERT INTO zzz_games VALUES (2, '2021-01-09');\nINSERT INTO zzz_games VALUES (3, '2020-12-19');\nINSERT INTO zzz_games VALUES (4, '2021-01-16');\nINSERT INTO zzz_games VALUES (5, '2021-01-23');\nINSERT INTO zzz_games VALUES (6, '2021-02-06');\n\n-- Result:\n-- +---------+------------+\n-- | game_id | game_date |\n-- +=========+============+\n-- | 1 | 2020-12-12 |\n-- +---------+------------+\n-- | 2 | 2021-01-09 |\n-- +---------+------------+\n-- | 3 | 2020-12-19 |\n-- +---------+------------+\n-- | 4 | 2021-01-16 |\n-- +---------+------------+\n-- | 5 | 2021-01-23 |\n-- +---------+------------+\n-- | 6 | 2021-02-06 |\n-- +---------+------------+\n\nCREATE TABLE zzz_teams (\nteam_id INT,\nteam_city VARCHAR(15)\n) USING DELTA;\n\nINSERT INTO zzz_teams VALUES (1, \"San Francisco\");\nINSERT INTO zzz_teams VALUES (2, \"Seattle\");\nINSERT INTO zzz_teams VALUES (3, \"Amsterdam\");\n\n-- Result:\n-- +---------+---------------+\n-- | team_id | team_city |\n-- +=========+===============+\n-- | 1 | San Francisco |\n-- +---------+---------------+\n-- | 2 | Seattle |\n-- +---------+---------------+\n-- | 3 | Amsterdam |\n-- +---------+---------------+\n\n```\n2. Create the first model: Click ![Create New File icon](https://docs.databricks.com/_images/dbt-cloud-create-new-file.png) (**Create New File**) in the upper-right corner.\n3. In the text editor, enter the following SQL statement. This statement creates a table that provides the details of each game, such as team names and scores. The `config` block instructs dbt to create a table in the database based on this statement. \n```\n-- Create a table that provides full details for each game, including\n-- the game ID, the home and visiting teams' city names and scores,\n-- the game winner's city name, and the game date.\n\n``` \n```\n{{ config(\nmaterialized='table',\nfile_format='delta'\n) }}\n\n``` \n```\n-- Step 4 of 4: Replace the visitor team IDs with their city names.\nselect\ngame_id,\nhome,\nt.team_city as visitor,\nhome_score,\nvisitor_score,\n-- Step 3 of 4: Display the city name for each game's winner.\ncase\nwhen\nhome_score > visitor_score\nthen\nhome\nwhen\nvisitor_score > home_score\nthen\nt.team_city\nend as winner,\ngame_date as date\nfrom (\n-- Step 2 of 4: Replace the home team IDs with their actual city names.\nselect\ngame_id,\nt.team_city as home,\nhome_score,\nvisitor_team_id,\nvisitor_score,\ngame_date\nfrom (\n-- Step 1 of 4: Combine data from various tables (for example, game and team IDs, scores, dates).\nselect\ng.game_id,\ngo.home_team_id,\ngs.home_team_score as home_score,\ngo.visitor_team_id,\ngs.visitor_team_score as visitor_score,\ng.game_date\nfrom\nzzz_games as g,\nzzz_game_opponents as go,\nzzz_game_scores as gs\nwhere\ng.game_id = go.game_id and\ng.game_id = gs.game_id\n) as all_ids,\nzzz_teams as t\nwhere\nall_ids.home_team_id = t.team_id\n) as visitor_ids,\nzzz_teams as t\nwhere\nvisitor_ids.visitor_team_id = t.team_id\norder by game_date desc\n\n```\n4. Click **Save As**.\n5. For the filename, enter `models/zzz_game_details.sql` and then click **Create**.\n6. Create a second model: Click ![Create New File icon](https://docs.databricks.com/_images/dbt-cloud-create-new-file.png) (**Create New File**) in the upper-right corner.\n7. In the text editor, enter the following SQL statement. This statement creates a view that lists team win-loss records for the season. \n```\n-- Create a view that summarizes the season's win and loss records by team.\n\n-- Step 2 of 2: Calculate the number of wins and losses for each team.\nselect\nwinner as team,\ncount(winner) as wins,\n-- Each team played in 4 games.\n(4 - count(winner)) as losses\nfrom (\n-- Step 1 of 2: Determine the winner and loser for each game.\nselect\ngame_id,\nwinner,\ncase\nwhen\nhome = winner\nthen\nvisitor\nelse\nhome\nend as loser\nfrom zzz_game_details\n)\ngroup by winner\norder by wins desc\n\n```\n8. Click **Save As**.\n9. For the filename, enter `models/zzz_win_loss_records.sql` and then click **Create**.\n10. Run the models: In the command line, run the `dbt run` command with the paths to the two preceding files. In the `default` database (as specified in your project settings), dbt creates one table named `zzz_game_details` and one view named `zzz_win_loss_records`. dbt gets these view and table names from their related `.sql` file names. \n```\ndbt run --model models/zzz_game_details.sql models/zzz_win_loss_records.sql\n\n``` \n```\n...\n... | 1 of 2 START table model default.zzz_game_details.................... [RUN]\n... | 1 of 2 OK created table model default.zzz_game_details............... [OK ...]\n... | 2 of 2 START view model default.zzz_win_loss_records................. [RUN]\n... | 2 of 2 OK created view model default.zzz_win_loss_records............ [OK ...]\n... |\n... | Finished running 1 table model, 1 view model ...\n\nCompleted successfully\n\nDone. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2\n\n```\n11. Run the following SQL code to list information about the new view and to select all rows from the table and view. \nIf you are connecting to a cluster, you can run this SQL code from a [notebook](https://docs.databricks.com/notebooks/notebooks-manage.html#create-a-notebook) that is attached to the cluster, specifying SQL as the default language for the notebook. If you are connecting to a SQL warehouse, you can run this SQL code from a [query](https://docs.databricks.com/sql/user/sql-editor/index.html#create-a-query). \n```\nSHOW VIEWS FROM default LIKE 'zzz_win_loss_records';\n\n``` \n```\n+-----------+----------------------+-------------+\n| namespace | viewName | isTemporary |\n+===========+======================+=============+\n| default | zzz_win_loss_records | false |\n+-----------+----------------------+-------------+\n\n``` \n```\nSELECT * FROM zzz_game_details;\n\n``` \n```\n+---------+---------------+---------------+------------+---------------+---------------+------------+\n| game_id | home | visitor | home_score | visitor_score | winner | date |\n+=========+===============+===============+============+===============+===============+============+\n| 1 | San Francisco | Seattle | 4 | 2 | San Francisco | 2020-12-12 |\n+---------+---------------+---------------+------------+---------------+---------------+------------+\n| 2 | San Francisco | Amsterdam | 0 | 1 | Amsterdam | 2021-01-09 |\n+---------+---------------+---------------+------------+---------------+---------------+------------+\n| 3 | Seattle | San Francisco | 1 | 2 | San Francisco | 2020-12-19 |\n+---------+---------------+---------------+------------+---------------+---------------+------------+\n| 4 | Seattle | Amsterdam | 3 | 2 | Seattle | 2021-01-16 |\n+---------+---------------+---------------+------------+---------------+---------------+------------+\n| 5 | Amsterdam | San Francisco | 3 | 0 | Amsterdam | 2021-01-23 |\n+---------+---------------+---------------+------------+---------------+---------------+------------+\n| 6 | Amsterdam | Seattle | 3 | 1 | Amsterdam | 2021-02-06 |\n+---------+---------------+---------------+------------+---------------+---------------+------------+\n\n``` \n```\nSELECT * FROM zzz_win_loss_records;\n\n``` \n```\n+---------------+------+--------+\n| team | wins | losses |\n+===============+======+========+\n| Amsterdam | 3 | 1 |\n+---------------+------+--------+\n| San Francisco | 2 | 2 |\n+---------------+------+--------+\n| Seattle | 1 | 3 |\n+---------------+------+--------+\n\n``` \n### Step 3: Create and run tests \nIn this step, you create *tests*, which are assertions you make about your models. When you run these tests, dbt tells you if each test in your project passes or fails. \nThere are two type of tests. *Schema tests*, written in YAML, return the number of records that do not pass an assertion. When this number is zero, all records pass, therefore the tests pass. *Data tests* are specific queries that must return zero records to pass. \n1. Create the schema tests: Click ![Create New File icon](https://docs.databricks.com/_images/dbt-cloud-create-new-file.png) (**Create New File**) in the upper-right corner.\n2. In the text editor, enter the following content. This file includes schema tests that determine whether the specified columns have unique values, are not null, have only the specified values, or a combination. \n```\nversion: 2\n\nmodels:\n- name: zzz_game_details\ncolumns:\n- name: game_id\ntests:\n- unique\n- not_null\n- name: home\ntests:\n- not_null\n- accepted_values:\nvalues: ['Amsterdam', 'San Francisco', 'Seattle']\n- name: visitor\ntests:\n- not_null\n- accepted_values:\nvalues: ['Amsterdam', 'San Francisco', 'Seattle']\n- name: home_score\ntests:\n- not_null\n- name: visitor_score\ntests:\n- not_null\n- name: winner\ntests:\n- not_null\n- accepted_values:\nvalues: ['Amsterdam', 'San Francisco', 'Seattle']\n- name: date\ntests:\n- not_null\n- name: zzz_win_loss_records\ncolumns:\n- name: team\ntests:\n- unique\n- not_null\n- relationships:\nto: ref('zzz_game_details')\nfield: home\n- name: wins\ntests:\n- not_null\n- name: losses\ntests:\n- not_null\n\n```\n3. Click **Save As**.\n4. For the filename, enter `models/schema.yml`, and then click **Create**.\n5. Create the first data test: Click ![Create New File icon](https://docs.databricks.com/_images/dbt-cloud-create-new-file.png) (**Create New File**) in the upper-right corner.\n6. In the text editor, enter the following SQL statement. This file includes a data test to determine whether any games happened outside of the regular season. \n```\n-- This season's games happened between 2020-12-12 and 2021-02-06.\n-- For this test to pass, this query must return no results.\n\nselect date\nfrom zzz_game_details\nwhere date < '2020-12-12'\nor date > '2021-02-06'\n\n```\n7. Click **Save As**.\n8. For the filename, enter `tests/zzz_game_details_check_dates.sql`, and then click **Create**.\n9. Create a second data test: Click ![Create New File icon](https://docs.databricks.com/_images/dbt-cloud-create-new-file.png) (**Create New File**) in the upper-right corner.\n10. In the text editor, enter the following SQL statement. This file includes a data test to determine whether any scores were negative or any games were tied. \n```\n-- This sport allows no negative scores or tie games.\n-- For this test to pass, this query must return no results.\n\nselect home_score, visitor_score\nfrom zzz_game_details\nwhere home_score < 0\nor visitor_score < 0\nor home_score = visitor_score\n\n```\n11. Click **Save As**.\n12. For the filename, enter `tests/zzz_game_details_check_scores.sql`, and then click **Create**.\n13. Create a third data test: Click ![Create New File icon](https://docs.databricks.com/_images/dbt-cloud-create-new-file.png) (**Create New File**) in the upper-right corner.\n14. In the text editor, enter the following SQL statement. This file includes a data test to determine whether any teams had negative win or loss records, had more win or loss records than games played, or played more games than were allowed. \n```\n-- Each team participated in 4 games this season.\n-- For this test to pass, this query must return no results.\n\nselect wins, losses\nfrom zzz_win_loss_records\nwhere wins < 0 or wins > 4\nor losses < 0 or losses > 4\nor (wins + losses) > 4\n\n```\n15. Click **Save As**.\n16. For the filename, enter `tests/zzz_win_loss_records_check_records.sql`, and then click **Create**.\n17. Run the tests: In the command line, run the `dbt test` command. \n### Step 4: Clean up \nYou can delete the tables and views you created for this example by running the following SQL code. \nIf you are connecting to a cluster, you can run this SQL code from a [notebook](https://docs.databricks.com/notebooks/notebooks-manage.html#create-a-notebook) that is attached to the cluster, specifying SQL as the default language for the notebook. If you are connecting to a SQL warehouse, you can run this SQL code from a [query](https://docs.databricks.com/sql/user/sql-editor/index.html#create-a-query). \n```\nDROP TABLE zzz_game_opponents;\nDROP TABLE zzz_game_scores;\nDROP TABLE zzz_games;\nDROP TABLE zzz_teams;\nDROP TABLE zzz_game_details;\nDROP VIEW zzz_win_loss_records;\n\nDROP TABLE diamonds;\nDROP TABLE diamonds_four_cs;\nDROP VIEW diamonds_list_colors;\nDROP VIEW diamonds_prices;\n\n```\n\n", "chunk_id": "26c2faf425c118d00de782fb92f77fed", "url": "https://docs.databricks.com/partners/prep/dbt-cloud.html"} +{"chunked_text": "# Technology partners\n## Connect to data prep partners using Partner Connect\n#### Connect to dbt Cloud\n##### Next steps\n\n* Learn more about dbt [models](https://docs.getdbt.com/docs/building-a-dbt-project/building-models).\n* Learn how to [test](https://docs.getdbt.com/docs/building-a-dbt-project/tests) your dbt projects.\n* Learn how to use [Jinja](https://docs.getdbt.com/docs/building-a-dbt-project/jinja-macros), a templating language, for programming SQL in your dbt projects.\n* Learn about dbt [best practices](https://docs.getdbt.com/docs/guides/best-practices).\n\n", "chunk_id": "e183a3ebe420c9d3b755343683cf8c64", "url": "https://docs.databricks.com/partners/prep/dbt-cloud.html"} +{"chunked_text": "# Technology partners\n## Connect to data prep partners using Partner Connect\n#### Connect to dbt Cloud\n##### Additional resources\n\n* [What, exactly, is dbt?](https://www.getdbt.com/blog/what-exactly-is-dbt)\n* [General dbt documentation](https://docs.getdbt.com/docs/introduction)\n* [dbt-core GitHub repository](https://github.com/dbt-labs/dbt)\n* [dbt CLI](https://docs.getdbt.com/dbt-cli/cli-overview)\n* [dbt pricing](https://www.getdbt.com/pricing/)\n* [Analytics Engineering for Everyone: Databricks in dbt Cloud](https://blog.getdbt.com/analytics-engineering-for-everyone-databricks-in-dbt-cloud/)\n* [dbt Cloud overview](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview)\n* [Connecting to Databricks](https://docs.getdbt.com/docs/dbt-cloud/cloud-configuring-dbt-cloud/connecting-your-database#connecting-to-databricks)\n* [dbt Discourse community](https://discourse.getdbt.com/)\n* [dbt blog](https://blog.getdbt.com/)\n* [Support](https://docs.getdbt.com/docs/dbt-cloud/cloud-dbt-cloud-support)\n\n", "chunk_id": "728879e5ed9bbabbb5534739fbfdbeab", "url": "https://docs.databricks.com/partners/prep/dbt-cloud.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Data types\n##### `MAP` type\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nRepresents values comprising a set of key-value pairs.\n\n##### `MAP` type\n###### Syntax\n\n```\nMAP \n\n``` \n* `keyType`: Any data type other than MAP specifying the keys.\n* `valueType`: Any data type specifying the values.\n\n##### `MAP` type\n###### Limits\n\nThe map type supports maps of any cardinality greater or equal to 0. \nThe keys must be unique and not be NULL. \n`MAP` is not a comparable data type.\n\n##### `MAP` type\n###### Literals\n\nSee [map function](https://docs.databricks.com/sql/language-manual/functions/map.html) for details on how to produce literal map values. \nSee [[ ] operator](https://docs.databricks.com/sql/language-manual/functions/bracketsign.html) for details on how to retrieve values from a map by key.\n\n", "chunk_id": "2e65f438df9ea5994e0a2039acea6f91", "url": "https://docs.databricks.com/sql/language-manual/data-types/map-type.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Data types\n##### `MAP` type\n###### Examples\n\n```\n> SELECT map('red', 1, 'green', 2);\n{red->1, green->2}\n\n> SELECT typeof(CAST(NULL AS MAP));\nMAP\n\n> SELECT map(array(1, 2), map('green', 5));\n{[1, 2]->{green->5}}\n\n> SELECT CAST(map(struct('Hello', 'World'), 'Greeting') AS MAP, string>);\n{{Hello, World}->Greeting}\n\n> SELECT m['red'] FROM VALUES(map('red', 1, 'green', 2)) AS T(m);\n1\n\n> SELECT map('red', 1) = map('red', 1);\nError: EqualTo does not support ordering on type map\n\n```\n\n##### `MAP` type\n###### Related\n\n* [[ ] operator](https://docs.databricks.com/sql/language-manual/functions/bracketsign.html)\n* [ARRAY type](https://docs.databricks.com/sql/language-manual/data-types/array-type.html)\n* [STRUCT type](https://docs.databricks.com/sql/language-manual/data-types/struct-type.html)\n* [map function](https://docs.databricks.com/sql/language-manual/functions/map.html)\n* [cast function](https://docs.databricks.com/sql/language-manual/functions/cast.html)\n\n", "chunk_id": "7f443f5318a323a1e214baf870cea893", "url": "https://docs.databricks.com/sql/language-manual/data-types/map-type.html"} +{"chunked_text": "# Security and compliance guide\n## Authentication and access control\n#### Access control lists\n\nThis article describes details about the permissions available for the different workspace objects. \nNote \nAccess control requires the [Premium plan or above](https://databricks.com/product/pricing/platform-addons). \nAccess control settings are disabled by default on workspaces that are upgraded from the Standard plan to the Premium plan or above. Once an access control setting is enabled, it can not be disabled. For more information, see [Access controls lists can be enabled on upgraded workspaces](https://docs.databricks.com/release-notes/product/2024/january.html#acls).\n\n#### Access control lists\n##### Access control lists overview\n\nIn Databricks, you can use access control lists (ACLs) to configure permission to access workspace level objects. Workspace admins have the CAN MANAGE permission on all objects in their workspace, which gives them the ability to manage permissions on all objects in their workspaces. Users automatically have the CAN MANAGE permission for objects that they create. \nFor an example of how to map typical personas to workspace-level permissions, see the [Proposal for Getting Started With Databricks Groups and Permissions](https://www.databricks.com/discover/pages/access-control). \n### Manage access control lists with folders \nYou can manage workspace object permissions by adding objects to folders. Objects in a folder inherit all permissions settings of that folder. For example, a user that has the CAN RUN permission on a folder has CAN RUN permission on the alerts in that folder. To learn about organizing objects into folders, see [Workspace browser](https://docs.databricks.com/workspace/workspace-browser/index.html).\n\n", "chunk_id": "806390834716dec387d2a92b576fed5f", "url": "https://docs.databricks.com/security/auth-authz/access-control/index.html"} +{"chunked_text": "# Security and compliance guide\n## Authentication and access control\n#### Access control lists\n##### Alerts ACLs\n\n| Ability | NO PERMISSIONS | CAN RUN | CAN MANAGE |\n| --- | --- | --- | --- |\n| See in alert list | | x | x |\n| View alert and result | | x | x |\n| Manually trigger alert run | | x | x |\n| Subscribe to notifications | | x | x |\n| Edit alert | | | x |\n| Modify permissions | | | x |\n| Delete alert | | | x |\n\n#### Access control lists\n##### Compute ACLs\n\n| Ability | NO PERMISSIONS | CAN ATTACH TO | CAN RESTART | CAN MANAGE |\n| --- | --- | --- | --- | --- |\n| Attach notebook to cluster | | x | x | x |\n| View Spark UI | | x | x | x |\n| View cluster metrics | | x | x | x |\n| View driver logs | | x | x | x |\n| Terminate cluster | | | x | x |\n| Start and restart cluster | | | x | x |\n| Edit cluster | | | | x |\n| Attach library to cluster | | | | x |\n| Resize cluster | | | | x |\n| Modify permissions | | | | x |\n\n#### Access control lists\n##### Legacy dashboard ACLs\n\n| Ability | NO PERMISSIONS | CAN VIEW | CAN RUN | CAN EDIT | CAN MANAGE |\n| --- | --- | --- | --- | --- | --- |\n| See in dashboard list | | x | x | x | x |\n| View dashboard and results | | x | x | x | x |\n| Refresh query results in the dashboard (or choose different parameters) | | | x | x | x |\n| Edit dashboard | | | | x | x |\n| Modify permissions | | | | | x |\n| Delete dashboard | | | | | x | \nEditing a legacy dashboard requires the **Run as viewer** sharing setting. See [Refresh behavior and execution context](https://docs.databricks.com/sql/user/dashboards/index.html#sharing-setting).\n\n", "chunk_id": "a56b789e71af37b44fd9ce8c39c2d0a2", "url": "https://docs.databricks.com/security/auth-authz/access-control/index.html"} +{"chunked_text": "# Security and compliance guide\n## Authentication and access control\n#### Access control lists\n##### Delta Live Tables ACLs\n\n| Ability | NO PERMISSIONS | CAN VIEW | CAN RUN | CAN MANAGE | IS OWNER |\n| --- | --- | --- | --- | --- | --- |\n| View pipeline details and list pipeline | | x | x | x | x |\n| View Spark UI and driver logs | | x | x | x | x |\n| Start and stop a pipeline update | | | x | x | x |\n| Stop pipeline clusters directly | | | x | x | x |\n| Edit pipeline settings | | | | x | x |\n| Delete the pipeline | | | | x | x |\n| Purge runs and experiments | | | | x | x |\n| Modify permissions | | | | x | x |\n\n#### Access control lists\n##### Feature tables ACLs\n\n| Ability | CAN VIEW METADATA | CAN EDIT METADATA | CAN MANAGE |\n| --- | --- | --- | --- |\n| Read feature table | X | X | X |\n| Search feature table | X | X | X |\n| Publish feature table to online store | X | X | X |\n| Write features to feature table | | X | X |\n| Update description of feature table | | X | X |\n| Modify permissions | | | X |\n| Delete feature table | | | X |\n\n#### Access control lists\n##### File ACLs\n\n| Ability | NO PERMISSIONS | CAN READ | CAN RUN | CAN EDIT | CAN MANAGE |\n| --- | --- | --- | --- | --- | --- |\n| Read file | | x | x | x | x |\n| Comment | | x | x | x | x |\n| Attach and detach file | | | x | x | x |\n| Run file interactively | | | x | x | x |\n| Edit file | | | | x | x |\n| Modify permissions | | | | | x |\n\n", "chunk_id": "7f397c98ef69535546e6a8ec911e7900", "url": "https://docs.databricks.com/security/auth-authz/access-control/index.html"} +{"chunked_text": "# Security and compliance guide\n## Authentication and access control\n#### Access control lists\n##### Folder ACLs\n\n| Ability | NO PERMISSIONS | CAN READ | CAN EDIT | CAN RUN | CAN MANAGE |\n| --- | --- | --- | --- | --- | --- |\n| List objects in folder | x | x | x | x | x |\n| View objects in folder | | x | x | x | x |\n| Clone and export items | | | x | x | x |\n| Run objects in the folder | | | | x | x |\n| Create, import, and delete items | | | | | x |\n| Move and rename items | | | | | x |\n| Modify permissions | | | | | x |\n\n#### Access control lists\n##### Git folder ACLs\n\n| Ability | NO PERMISSIONS | CAN READ | CAN RUN | CAN EDIT | CAN MANAGE |\n| --- | --- | --- | --- | --- | --- |\n| List assets in a folder | x | x | x | x | x |\n| View assets in a folder | | x | x | x | x |\n| Clone and export assets | | x | x | x | x |\n| Run executable assets in folder | | | x | x | x |\n| Edit and rename assets in a folder | | | | x | x |\n| Create a branch in a folder | | | | | x |\n| Pull or push a branch into a folder | | | | | x |\n| Create, import, delete, and move assets | | | | | x |\n| Modify permissions | | | | | x |\n\n", "chunk_id": "6e4263fec80026b1cb47c5af1af8f5ab", "url": "https://docs.databricks.com/security/auth-authz/access-control/index.html"} +{"chunked_text": "# Security and compliance guide\n## Authentication and access control\n#### Access control lists\n##### Job ACLs\n\n| Ability | NO PERMISSIONS | CAN VIEW | CAN MANAGE RUN | IS OWNER | CAN MANAGE |\n| --- | --- | --- | --- | --- | --- |\n| View job details and settings | | x | x | x | x |\n| View results | | x | x | x | x |\n| View Spark UI, logs of a job run | | | x | x | x |\n| Run now | | | x | x | x |\n| Cancel run | | | x | x | x |\n| Edit job settings | | | | x | x |\n| Delete job | | | | x | x |\n| Modify permissions | | | | x | x |\n\n#### Access control lists\n##### Dashboard ACLs\n\n| Ability | NO PERMISSIONS | CAN VIEW/CAN RUN | CAN EDIT | CAN MANAGE |\n| --- | --- | --- | --- | --- |\n| View dashboard and results | | x | x | x |\n| Interact with widgets | | x | x | x |\n| Refresh the dashboard | | x | x | x |\n| Edit dashboard | | | x | x |\n| Clone dashboard | | x | x | x |\n| Publish dashboard snapshot | | | x | x |\n| Modify permissions | | | | x |\n| Delete dashboard | | | | x |\n\n#### Access control lists\n##### MLFlow experiment ACLs\n\n| Ability | NO PERMISSIONS | CAN READ | CAN EDIT | CAN MANAGE |\n| --- | --- | --- | --- | --- |\n| View run info search compare runs | | x | x | x |\n| View, list, and download run artifacts | | x | x | x |\n| Create, delete, and restore runs | | | x | x |\n| Log run params, metrics, tags | | | x | x |\n| Log run artifacts | | | x | x |\n| Edit experiment tags | | | x | x |\n| Purge runs and experiments | | | | x |\n| Modify permissions | | | | x |\n\n", "chunk_id": "7b71fdf8ad85a93fc66bba9ff91bfaa8", "url": "https://docs.databricks.com/security/auth-authz/access-control/index.html"} +{"chunked_text": "# Security and compliance guide\n## Authentication and access control\n#### Access control lists\n##### MLFlow model ACLs\n\n| Ability | NO PERMISSIONS | CAN READ | CAN EDIT | CAN MANAGE STAGING VERSIONS | CAN MANAGE PRODUCTION VERSIONS | CAN MANAGE |\n| --- | --- | --- | --- | --- | --- | --- |\n| View model details, versions, stage transition requests, activities, and artifact download URIs | | x | x | x | x | x |\n| Request a model version stage transition | | x | x | x | x | x |\n| Add a version to a model | | | x | x | x | x |\n| Update model and version description | | | x | x | x | x |\n| Add or edit tags | | | x | x | x | x |\n| Transition model version between stages | | | | x | x | x |\n| Approve a transition request | | | | x | x | x |\n| Cancel a transition request | | | | | | x |\n| Rename model | | | | | | x |\n| Modify permissions | | | | | | x |\n| Delete model and model versions | | | | | | x |\n\n#### Access control lists\n##### Notebook ACLs\n\n| Ability | NO PERMISSIONS | CAN READ | CAN RUN | CAN EDIT | CAN MANAGE |\n| --- | --- | --- | --- | --- | --- |\n| View cells | | x | x | x | x |\n| Comment | | x | x | x | x |\n| Run via %run or notebook workflows | | x | x | x | x |\n| Attach and detach notebooks | | | x | x | x |\n| Run commands | | | x | x | x |\n| Edit cells | | | | x | x |\n| Modify permissions | | | | | x |\n\n#### Access control lists\n##### Pool ACLs\n\n| Ability | NO PERMISSIONS | CAN ATTACH TO | CAN MANAGE |\n| --- | --- | --- | --- |\n| Attach cluster to pool | | x | x |\n| Delete pool | | | x |\n| Edit pool | | | x |\n| Modify permissions | | | x |\n\n", "chunk_id": "262420f62b560b5bdae953379d6e9048", "url": "https://docs.databricks.com/security/auth-authz/access-control/index.html"} +{"chunked_text": "# Security and compliance guide\n## Authentication and access control\n#### Access control lists\n##### Query ACLs\n\n| Ability | NO PERMISSIONS | CAN VIEW | CAN RUN | CAN EDIT | CAN MANAGE |\n| --- | --- | --- | --- | --- | --- |\n| View own queries | | x | x | x | x |\n| See in query list | | x | x | x | x |\n| View query text | | x | x | x | x |\n| View query result | | x | x | x | x |\n| Refresh query result (or choose different parameters) | | | x | x | x |\n| Include the query in a dashboard | | | x | x | x |\n| Edit query text | | | | x | x |\n| Change SQL warehouse or data source | | | | | x |\n| Modify permissions | | | | | x |\n| Delete query | | | | | x |\n\n#### Access control lists\n##### Secret ACLs\n\n| Ability | READ | WRITE | MANAGE |\n| --- | --- | --- | --- |\n| Read the secret scope | x | x | x |\n| List secrets in the scope | x | x | x |\n| Write to the secret scope | | x | x |\n| Modify permissions | | | x |\n\n#### Access control lists\n##### Serving endpoint ACLs\n\n| Ability | NO PERMISSIONS | CAN VIEW | CAN QUERY | CAN MANAGE |\n| --- | --- | --- | --- | --- |\n| Get endpoint | | x | x | x |\n| List endpoint | | x | x | x |\n| Query endpoint | | | x | x |\n| Update endpoint config | | | | x |\n| Delete endpoint | | | | x |\n| Modify permissions | | | | x |\n\n", "chunk_id": "2e0ed9b8e1915c36250ff17e878d184a", "url": "https://docs.databricks.com/security/auth-authz/access-control/index.html"} +{"chunked_text": "# Security and compliance guide\n## Authentication and access control\n#### Access control lists\n##### SQL warehouse ACLs\n\n| Ability | NO PERMISSIONS | CAN USE | IS OWNER | CAN MANAGE |\n| --- | --- | --- | --- | --- |\n| Start the warehouse | | x | x | x |\n| View details for the warehouse | | x | x | x |\n| View all queries for the warehouse | | | x | x |\n| View warehouse monitoring tab | | | x | x |\n| Stop the warehouse | | | x | x |\n| Delete the warehouse | | | x | x |\n| Edit the warehouse | | | x | x |\n| Modify permissions | | | x | x |\n\n", "chunk_id": "fd31767c32d2c5f6e57a4cd38a03fe4e", "url": "https://docs.databricks.com/security/auth-authz/access-control/index.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### ALTER CONNECTION\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 13.3 LTS and above ![check marked yes](https://docs.databricks.com/_images/check.png) Unity Catalog only \nTransfers the ownership of a connection to a new [principal](https://docs.databricks.com/sql/language-manual/sql-ref-principal.html), renames a connection, or changes the connection options. \nTo set a comment on a connection use [COMMENT ON CONNECTION](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-comment.html).\n\n#### ALTER CONNECTION\n##### Syntax\n\n```\nALTER CONNECTION connection_name\n{ [ SET ] OWNER TO principal |\nRENAME TO new_connection_name |\nOPTIONS ( option value [, ...] )\n\n```\n\n", "chunk_id": "5278235538123f1b95ad988c6c60bdbe", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-alter-connection.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### ALTER CONNECTION\n##### Parameters\n\n* **[connection\\_name](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#connection-name)** \nThe name of the connection to be altered.\n* **[ SET ] OWNER TO [principal](https://docs.databricks.com/sql/language-manual/sql-ref-principal.html)** \nTransfers ownership of the connection to `principal`.\n* **RENAME TO [new\\_connection\\_name](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#connection-name)** \nSpecifies a new name for the connection. The name must be unique within the Unity Catalog metastore.\n* **OPTIONS** \nSets `connection_type` specific parameters needed to establish the connection. \nReplaces the existing list of options with a new list of options. \n+ **option** \nThe property key. The key can consist of one or more [identifiers](https://docs.databricks.com/sql/language-manual/sql-ref-identifiers.html) separated by a dot, or a `STRING` literal. \nProperty keys must be unique and are case-sensitive.\n+ **value** \nThe value for the property. The value must be a `BOOLEAN`, `STRING`, `INTEGER`, or `DECIMAL` constant expression. \nFor example a the `value` for `password` may be using the constant expression `secret('secrets.r.us', 'postgresPassword')` as opposed to entering the literal password.\n\n#### ALTER CONNECTION\n##### Examples\n\n```\n> ALTER CONNECTION mysql_connection SET OWNER TO `alf@melmak.et`;\n\n> ALTER CONNECTION mysql_connection RENAME TO `other_mysql_connection`;\n\n> ALTER CONNECTION mysql_connection OPTIONS (host 'newmysqlhost.us-west-2.amazonaws.com', port '3306');\n\n```\n\n", "chunk_id": "c409c169bbcf3bcfd35f03bfdc3d5c9c", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-alter-connection.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### ALTER CONNECTION\n##### Related articles\n\n* [COMMENT ON CONNECTION](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-comment.html)\n* [CREATE CONNECTION](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-connection.html)\n* [DESCRIBE CONNECTION](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-describe-connection.html)\n* [DROP CONNECTION](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-drop-connection.html)\n* [SHOW CONNECTIONS](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-show-connections.html)\n\n", "chunk_id": "e7801ac41105e8d2d9b0fb154d58e7e1", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-alter-connection.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Data types\n##### `BOOLEAN` type\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nRepresents Boolean values.\n\n##### `BOOLEAN` type\n###### Syntax\n\n```\nBOOLEAN\n\n```\n\n##### `BOOLEAN` type\n###### Limits\n\nThe type supports true and false values.\n\n##### `BOOLEAN` type\n###### Literals\n\n```\n{ TRUE | FALSE }\n\n```\n\n##### `BOOLEAN` type\n###### Examples\n\n```\n> SELECT true;\nTRUE\n\n> SELECT typeof(false);\nBOOLEAN\n\n> SELECT CAST(0 AS BOOLEAN);\nFALSE\n\n> SELECT CAST(-1 AS BOOLEAN);\nTRUE\n\n> SELECT CAST('true' AS BOOLEAN);\nTRUE\n\n```\n\n##### `BOOLEAN` type\n###### Related\n\n* [cast function](https://docs.databricks.com/sql/language-manual/functions/cast.html)\n\n", "chunk_id": "1ac47ec6bba6aa27f5cad5f4d9077a69", "url": "https://docs.databricks.com/sql/language-manual/data-types/boolean-type.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n\nNote \nThis feature requires the [Premium plan or above](https://databricks.com/product/pricing/platform-addons). \nThis article provides you with a comprehensive reference of available audit log services and events. By understanding which events are logged in the audit logs, your enterprise can monitor detailed Databricks usage patterns in your account. \nThe easiest way to access and query your account\u2019s audit logs is by using [system tables (Public Preview)](https://docs.databricks.com/admin/system-tables/index.html). \nIf you\u2019d like to configure a regular log delivery, see [Configure audit log delivery](https://docs.databricks.com/admin/account-settings/audit-log-delivery.html).\n\n", "chunk_id": "7ede99db5808667520cf0a035dbb3177", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Audit log services\n\nThe following services and their events are logged by default in audit logs. \n### Workspace-level services \nWorkspace-level audit logs are available for these services: \n| Service name | Description |\n| --- | --- |\n| [accounts](https://docs.databricks.com/admin/account-settings/audit-logs.html#accounts) | Events related to accounts, users, groups, and IP access lists. |\n| [clusterPolicies](https://docs.databricks.com/admin/account-settings/audit-logs.html#clusterpolicies) | Events related to cluster policies. |\n| [clusters](https://docs.databricks.com/admin/account-settings/audit-logs.html#clusters) | Events related to clusters. |\n| [dashboards](https://docs.databricks.com/admin/account-settings/audit-logs.html#dashboards) | Events related to Lakeview dashboard use. |\n| [databrickssql](https://docs.databricks.com/admin/account-settings/audit-logs.html#dbsql) | Events related to Databricks SQL use. |\n| [dbfs](https://docs.databricks.com/admin/account-settings/audit-logs.html#dbfs) | Events related to [DBFS](https://docs.databricks.com/dbfs/index.html). |\n| [deltaPipelines](https://docs.databricks.com/admin/account-settings/audit-logs.html#delta-pipelines) | Events related to [Delta Live Table pipelines](https://docs.databricks.com/delta-live-tables/index.html). |\n| [featureStore](https://docs.databricks.com/admin/account-settings/audit-logs.html#feature-store) | Events related to the [Databricks Feature Store](https://docs.databricks.com/machine-learning/feature-store/index.html). |\n| [filesystem](https://docs.databricks.com/admin/account-settings/audit-logs.html#filesystem) | Events related to the Files API. |\n| [genie](https://docs.databricks.com/admin/account-settings/audit-logs.html#genie) | Events related to workspace access by support personnel. |\n| [gitCredentials](https://docs.databricks.com/admin/account-settings/audit-logs.html#git-credentials) | Events related to Git credentials for [Databricks Git folders](https://docs.databricks.com/repos/index.html). See also `repos`. |\n| [globalInitScripts](https://docs.databricks.com/admin/account-settings/audit-logs.html#init-scripts) | Events related to global init scripts. |\n| [groups](https://docs.databricks.com/admin/account-settings/audit-logs.html#groups) | Events related to account and workspace groups. |\n| [iamRole](https://docs.databricks.com/admin/account-settings/audit-logs.html#iam-role) | Events related to IAM role permissions. |\n| [ingestion](https://docs.databricks.com/admin/account-settings/audit-logs.html#ingestion) | Events related to file uploads. |\n| [instancePools](https://docs.databricks.com/admin/account-settings/audit-logs.html#instance-pool) | Events related to [pools](https://docs.databricks.com/compute/pools.html). |\n| [jobs](https://docs.databricks.com/admin/account-settings/audit-logs.html#jobs) | Events related to jobs. |\n| [marketplaceConsumer](https://docs.databricks.com/admin/account-settings/audit-logs.html#marketplace-consumer) | Events related to consumer actions in Databricks Marketplace. |\n| [marketplaceProvider](https://docs.databricks.com/admin/account-settings/audit-logs.html#marketplace-provider) | Events related to provider actions in Databricks Marketplace. |\n| [mlflowAcledArtifact](https://docs.databricks.com/admin/account-settings/audit-logs.html#artifacts) | Events related to ML Flow artifacts with ACLs. |\n| [mlflowExperiment](https://docs.databricks.com/admin/account-settings/audit-logs.html#experiment) | Events related to ML Flow experiments. |\n| [modelRegistry](https://docs.databricks.com/admin/account-settings/audit-logs.html#model-registry) | Events related to the workspace model registry. For activity logs for models in Unity Catalog, see [Unity Catalog events](https://docs.databricks.com/admin/account-settings/audit-logs.html#uc). |\n| [notebook](https://docs.databricks.com/admin/account-settings/audit-logs.html#notebook) | Events related to notebooks. |\n| [partnerConnect](https://docs.databricks.com/admin/account-settings/audit-logs.html#partner-connect) | Events related to [Partner Connect](https://docs.databricks.com/partner-connect/index.html). |\n| [remoteHistoryService](https://docs.databricks.com/admin/account-settings/audit-logs.html#remote-history) | Events related to adding a removing GitHub Credentials. |\n| [repos](https://docs.databricks.com/admin/account-settings/audit-logs.html#repos) | Events related to [Databricks Git folders](https://docs.databricks.com/repos/index.html). See also `gitCredentials`. |\n| [secrets](https://docs.databricks.com/admin/account-settings/audit-logs.html#secrets) | Events related to [secrets](https://docs.databricks.com/api/workspace/secrets). |\n| [serverlessRealTimeInference](https://docs.databricks.com/admin/account-settings/audit-logs.html#realtime) | Events related to [model serving](https://docs.databricks.com/machine-learning/model-serving/index.html). |\n| [sqlPermissions](https://docs.databricks.com/admin/account-settings/audit-logs.html#permissions) | Events related to the legacy Hive metastore table access control. |\n| [ssh](https://docs.databricks.com/admin/account-settings/audit-logs.html#ssh) | Events related to [SSH access](https://docs.databricks.com/archive/compute/configure.html#ssh-access). |\n| [vectorSearch](https://docs.databricks.com/admin/account-settings/audit-logs.html#vector-search) | Events related to [Databricks Vector Search](https://docs.databricks.com/generative-ai/vector-search.html). |\n| [webTerminal](https://docs.databricks.com/admin/account-settings/audit-logs.html#web-terminal) | Events related to the [web terminal](https://docs.databricks.com/compute/web-terminal.html) feature. |\n| [workspace](https://docs.databricks.com/admin/account-settings/audit-logs.html#workspace) | Events related to workspaces. | \n### Account-level services \nAccount-level audit logs are available for these services: \n| Service name | Description |\n| --- | --- |\n| [accountBillableUsage](https://docs.databricks.com/admin/account-settings/audit-logs.html#billable) | Actions related to billable usage access in the account console. |\n| [accounts](https://docs.databricks.com/admin/account-settings/audit-logs.html#account-accounts) | Actions related to account-level access and identity management. |\n| [accountsAccessControl](https://docs.databricks.com/admin/account-settings/audit-logs.html#accounts-access-control) | Actions related to account-level access control rules. |\n| [accountsManager](https://docs.databricks.com/admin/account-settings/audit-logs.html#manager) | Actions performed in the account console. |\n| [logDelivery](https://docs.databricks.com/admin/account-settings/audit-logs.html#log-delivery) | Log delivery configuration for such as billable usage or audit logs. |\n| [oauth2](https://docs.databricks.com/admin/account-settings/audit-logs.html#oauth) | Actions related to OAuth SSO authentication to the account console. |\n| [servicePrincipalCredentials](https://docs.databricks.com/admin/account-settings/audit-logs.html#service-principal) | Actions related to service principal credentials. |\n| [ssoConfigBackend](https://docs.databricks.com/admin/account-settings/audit-logs.html#sso) | Single sign-on settings for the account. |\n| [unityCatalog](https://docs.databricks.com/admin/account-settings/audit-logs.html#uc) | Actions performed in Unity Catalog. This also includes Delta Sharing events, see [Delta Sharing events](https://docs.databricks.com/admin/account-settings/audit-logs.html#ds). | \n### Additional security monitoring services \nThere are additional services and associated actions for workspaces that use the [compliance security profile](https://docs.databricks.com/security/privacy/security-profile.html) (required for some compliance standards such as FedRAMP, PCI, and HIPAA) or [Enhanced security monitoring](https://docs.databricks.com/security/privacy/enhanced-security-monitoring.html). \nThese are workspace-level services that will only generate in your logs if you are using the compliance security profile or enhanced security monitoring: \n| Service name | Description |\n| --- | --- |\n| [capsule8-alerts-dataplane](https://docs.databricks.com/admin/account-settings/audit-logs.html#capsule8) | Actions related to file integrity monitoring. |\n| [clamAVScanService-dataplanel](https://docs.databricks.com/admin/account-settings/audit-logs.html#clamav) | Actions related to antivirus monitoring. |\n| [monit](https://docs.databricks.com/admin/account-settings/audit-logs.html#monit) | Actions related to the process monitor. |\n| [syslog](https://docs.databricks.com/admin/account-settings/audit-logs.html#syslog) | Actions related to the system logs. |\n\n", "chunk_id": "bfffda33428234dca4fd104eca698e8d", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Audit log example schema\n\nIn Databricks, audit logs output events in a JSON format. The `serviceName` and `actionName` properties identify the event. The naming convention follows the Databricks [REST API](https://docs.databricks.com/api/workspace/introduction). \nThe following example is for a `createMetastoreAssignment` event. \n```\n{\n\"version\":\"2.0\",\n\"auditLevel\":\"ACCOUNT_LEVEL\",\n\"timestamp\":1629775584891,\n\"orgId\":\"3049056262456431186970\",\n\"shardName\":\"test-shard\",\n\"accountId\":\"77636e6d-ac57-484f-9302-f7922285b9a5\",\n\"sourceIPAddress\":\"10.2.91.100\",\n\"userAgent\":\"curl/7.64.1\",\n\"sessionId\":\"f836a03a-d360-4792-b081-baba525324312\",\n\"userIdentity\":{\n\"email\":\"crampton.rods@email.com\",\n\"subjectName\":null\n},\n\"serviceName\":\"unityCatalog\",\n\"actionName\":\"createMetastoreAssignment\",\n\"requestId\":\"ServiceMain-da7fa5878f40002\",\n\"requestParams\":{\n\"workspace_id\":\"30490590956351435170\",\n\"metastore_id\":\"abc123456-8398-4c25-91bb-b000b08739c7\",\n\"default_catalog_name\":\"main\"\n},\n\"response\":{\n\"statusCode\":200,\n\"errorMessage\":null,\n\"result\":null\n},\n\"MAX_LOG_MESSAGE_LENGTH\":16384\n}\n\n``` \n### Audit log schema considerations \n* If actions take a long time, the request and response are logged separately but the request and response pair have the same `requestId`.\n* Automated actions, such as resizing a cluster due to autoscaling or launching a job due to scheduling, are performed by the user `System-User`.\n* The `requestParams` field is subject to truncation. If the size of its JSON representation exceeds 100 KB, values are truncated and the string `... truncated` is appended to truncated entries. In rare cases where a truncated map is still larger than 100 KB, a single `TRUNCATED` key with an empty value is present instead.\n\n", "chunk_id": "52371b52623a0d055f4cfa55773489db", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Account events\n\nThe following are `accounts` events logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `accounts` | `activateUser` | A user is reactivated after being deactivated. See [Deactivate users in workspace](https://docs.databricks.com/admin/users-groups/users.html#deactivate-user-workspace). | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `add` | A user is added to a Databricks workspace. | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `addPrincipalToGroup` | A user is added to a workspace-level group. | * `targetGroupId` * `endpoint` * `targetUserId` * `targetGroupName` * `targetUserName` |\n| `accounts` | `addX509` | A user account is added using an X509 certificate for authentication | |\n| `accounts` | `certLogin` | A user logs in to Databricks using X509 certification. | * `user` |\n| `accounts` | `changeDatabricksSqlAcl` | A user\u2019s Databricks SQL permissions are changed. | * `shardName` * `targetUserId` * `resourceId` * `aclPermissionSet` |\n| `accounts` | `changeDatabricksWorkspaceAcl` | Permissions to a workspace are changed. | * `shardName` * `targetUserId` * `resourceId` * `aclPermissionSet` |\n| `accounts` | `changeDbTokenAcl` | When permissions on a token are changed. | * `shardName` * `targetUserId` * `resourceId` * `aclPermissionSet` |\n| `accounts` | `changePassword` | A user\u2019s password is changed. | * `newPasswordSource` * `targetUserId` * `serviceSource` * `wasPasswordChanged` * `userId` |\n| `accounts` | `changePasswordAcl` | Password changing permissions are changed in the account. | * `shardName` * `targetUserId` * `resourceId` * `aclPermissionSet` |\n| `accounts` | `changeServicePrincipalAcls` | When a service principal\u2019s permissions are changed. | * `shardName` * `targetServicePrincipal` * `resourceId` * `aclPermissionSet` |\n| `accounts` | `createGroup` | A workspace-level group is created. | * `endpoint` * `targetGroupId` * `targetGroupName` |\n| `accounts` | `createIpAccessList` | An IP access list is added to the workspace. | * `ipAccessListId` * `userId` |\n| `accounts` | `deactivateUser` | A user is deactivated in the workspace. See [Deactivate users in workspace](https://docs.databricks.com/admin/users-groups/users.html#deactivate-user-workspace). | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `delete` | A user is deleted from the Databricks workspace. | * `targetUserId` * `targetUserName` * `endpoint` |\n| `accounts` | `deleteIpAccessList` | An IP access list is deleted from the workspace. | * `ipAccessListId` * `userId` |\n| `accounts` | `garbageCollectDbToken` | A user runs a garbage collect command on expired tokens. | * `tokenExpirationTime` * `tokenClientId` * `userId` * `tokenCreationTime` * `tokenFirstAccessed` |\n| `accounts` | `generateDbToken` | When someone generates a token from User Settings or when the service generates the token. | * `tokenExpirationTime` * `tokenCreatedBy` * `tokenHash` * `userId` |\n| `accounts` | `IpAccessDenied` | A user attempts to connect to the service through a denied IP. | * `path` * `userName` |\n| `accounts` | `ipAccessListQuotaExceeded` | | * `userId` |\n| `accounts` | `jwtLogin` | User logs into Databricks using a JWT. | * `user` |\n| `accounts` | `login` | User logs into the workspace. | * `user` |\n| `accounts` | `logout` | User logs out of the workspace. | * `user` |\n| `accounts` | `mfaAddKey` | User registers a new security key. | |\n| `accounts` | `mfaDeleteKey` | User deletes a security key. | * `id` |\n| `accounts` | `mfaLogin` | User logs into Databricks using MFA. | * `user` |\n| `accounts` | `oidcTokenAuthorization` | When an API call is authorized through a generic OIDC/OAuth token. | * `user` |\n| `accounts` | `passwordVerifyAuthentication` | | * `user` |\n| `accounts` | `reachMaxQuotaDbToken` | When the current number of non-expired tokens exceeds the token quota | |\n| `accounts` | `removeAdmin` | A user is revoked of workspace admin permissions. | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `removeGroup` | A group is removed from the workspace. | * `targetGroupId` * `targetGroupName` * `endpoint` |\n| `accounts` | `removePrincipalFromGroup` | A user is removed from a group. | * `targetGroupId` * `endpoint` * `targetUserId` * `targetGroupName` * `targetUserName` |\n| `accounts` | `resetPassword` | A user\u2019s password is reset. | * `serviceSource` * `userId` * `endpoint` * `targetUserId` * `targetUserName` * `wasPasswordChanged` * `newPasswordSource` |\n| `accounts` | `revokeDbToken` | A user\u2019s token is dropped from a workspace. Can be triggered by a user being removed from the Databricks account. | * `userId` |\n| `accounts` | `samlLogin` | User logs in to Databricks through SAML SSO. | * `user` |\n| `accounts` | `setAdmin` | A user is granted account admin permissions. | * `endpoint` * `targetUserName` * `targetUserId` |\n| `accounts` | `tokenLogin` | A user logs into Databricks using a token. | * `tokenId` * `user` |\n| `accounts` | `updateIpAccessList` | An IP access list is changed. | * `ipAccessListId` * `userId` |\n| `accounts` | `updateUser` | An account admin updates a user\u2019s account. | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `validateEmail` | When a user validates their email after account creation. | * `endpoint` * `targetUserName` * `targetUserId` |\n\n", "chunk_id": "9dcfb6549375e2c3ad9420c675400edc", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Clusters events\n\nThe following are `cluster` events logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `clusters` | `changeClusterAcl` | A user changes the cluster ACL. | * `shardName` * `aclPermissionSet` * `targetUserId` * `resourceId` |\n| `clusters` | `create` | A user creates a cluster. | * `cluster_log_conf` * `num_workers` * `enable_elastic_disk` * `driver_node_type_id` * `start_cluster` * `docker_image` * `ssh_public_keys` * `aws_attributes` * `acl_path_prefix` * `node_type_id` * `instance_pool_id` * `spark_env_vars` * `init_scripts` * `spark_version` * `cluster_source` * `autotermination_minutes` * `cluster_name` * `autoscale` * `custom_tags` * `cluster_creator` * `enable_local_disk_encryption` * `idempotency_token` * `spark_conf` * `organization_id` * `no_driver_daemon` * `user_id` * `virtual_cluster_size` * `apply_policy_default_values` * `data_security_mode` * `runtime_engine` |\n| `clusters` | `createResult` | Results from cluster creation. In conjunction with `create`. | * `clusterName` * `clusterState` * `clusterId` * `clusterWorkers` * `clusterOwnerUserId` |\n| `clusters` | `delete` | A cluster is terminated. | * `cluster_id` |\n| `clusters` | `deleteResult` | Results from cluster termination. In conjunction with `delete`. | * `clusterName` * `clusterState` * `clusterId` * `clusterWorkers` * `clusterOwnerUserId` |\n| `clusters` | `edit` | A user makes changes to cluster settings. This logs all changes except for changes in cluster size or autoscaling behavior. | * `cluster_log_conf` * `num_workers` * `enable_elastic_disk` * `driver_node_type_id` * `start_cluster` * `docker_image` * `ssh_public_keys` * `aws_attributes` * `acl_path_prefix` * `node_type_id` * `instance_pool_id` * `spark_env_vars` * `init_scripts` * `spark_version` * `cluster_source` * `autotermination_minutes` * `cluster_name` * `autoscale` * `custom_tags` * `cluster_creator` * `enable_local_disk_encryption` * `idempotency_token` * `spark_conf` * `organization_id` * `no_driver_daemon` * `user_id` * `virtual_cluster_size` * `apply_policy_default_values` * `data_security_mode` * `runtime_engine` |\n| `clusters` | `permanentDelete` | A cluster is deleted from the UI. | * `cluster_id` |\n| `clusters` | `resize` | Cluster resizes. This is logged on running clusters where the only property that changes is either the cluster size or autoscaling behavior. | * `cluster_id` * `num_workers` * `autoscale` |\n| `clusters` | `resizeResult` | Results from cluster resize. In conjunction with `resize`. | * `clusterName` * `clusterState` * `clusterId` * `clusterWorkers` * `clusterOwnerUserId` |\n| `clusters` | `restart` | A user restarts a running cluster. | * `cluster_id` |\n| `clusters` | `restartResult` | Results from cluster restart. In conjunction with `restart`. | * `clusterName` * `clusterState` * `clusterId` * `clusterWorkers` * `clusterOwnerUserId` |\n| `clusters` | `start` | A user starts a cluster. | * `init_scripts_safe_mode` * `cluster_id` |\n| `clusters` | `startResult` | Results from cluster start. In conjunction with `start`. | * `clusterName` * `clusterState` * `clusterId` * `clusterWorkers` * `clusterOwnerUserId` |\n\n", "chunk_id": "22f145c6123ab8667af87b7f2be69d2b", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Cluster libraries events\n\nThe following are `clusterLibraries` events logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `clusterLibraries` | `installLibraries` | User installs a library on a cluster. | * `cluster_id` * `libraries` |\n| `clusterLibraries` | `uninstallLibraries` | User uninstalls a library on a cluster. | * `cluster_id` * `libraries` |\n| `clusterLibraries` | `installLibraryOnAllClusters` | A workspace admin schedules a library to install on all cluster. | * `user` * `library` |\n| `clusterLibraries` | `uninstallLibraryOnAllClusters` | A workspace admin removes a library from the list to install on all clusters. | * `user` * `library` |\n\n### Audit log reference\n#### Cluster policy events\n\nThe following are `clusterPolicies` events logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `clusterPolicies` | `create` | A user created a cluster policy. | * `name` |\n| `clusterPolicies` | `edit` | A user edited a cluster policy. | * `policy_id` * `name` |\n| `clusterPolicies` | `delete` | A user deleted a cluster policy. | * `policy_id` |\n| `clusterPolicies` | `changeClusterPolicyAcl` | A workspace admin changes permissions for a cluster policy. | * `shardName` * `targetUserId` * `resourceId` * `aclPermissionSet` |\n\n", "chunk_id": "5744c91e5c99bd668d4e63dbaf6a5732", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Dashboards events\n\nThe following are `dashboards` events logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `dashboards` | `createDashboard` | A user creates a new Lakeview dashboard using the UI or API. | * `dashboard_id` |\n| `dashboards` | `updateDashboard` | A user makes an update to a Lakeview dashboard using the UI or API. | * `dashboard_id` |\n| `dashboards` | `cloneDashboard` | A user clones a Lakeview dashboard. | * `source_dashboard_id` * `new_dashboard_id` |\n| `dashboards` | `publishDashboard` | A user publishes a Lakeview dashboard with or without embedded credentials using the UI or API. | * `dashboard_id` * `credentials_embedded` * `warehouse_id` |\n| `dashboards` | `unpublishDashboard` | A user unpublishes a published Lakeview dashboard using the UI or API. | * `dashboard_id` |\n| `dashboards` | `trashDashboard` | A user moves a Lakeview dashboard to the trash using the UI or API. | * `dashboard_id` |\n| `dashboards` | `restoreDashboard` | A user restores a Lakeview dashboard from the trash. | * `dashboard_id` |\n| `dashboards` | `migrateDashboard` | A user migrates a DBSQL dashboard to a Lakeview dashboard. | * `source_dashboard_id` * `new_dashboard_id` |\n| `dashboards` | `createSchedule` | A user creates an email subscription schedule. | * `dashboard_id` * `schedule_id` |\n| `dashboards` | `updateSchedule` | A user makes an update to a Lakeview dashboard\u2019s schedule. | * `dashboard_id` * `schedule_id` |\n| `dashboards` | `deleteSchedule` | A user deletes a Lakeview dashboard\u2019s schedule. | * `dashboard_id` * `schedule_id` |\n| `dashboards` | `createSubscription` | A user subscribes an email destination to a Lakeview dashboard schedule. | * `dashboard_id` * `schedule_id` * `schedule` |\n| `dashboards` | `deleteSubscription` | A user deletes an email destination from a Lakeview dashboard schedule. | * `dashboard_id` * `schedule_id` |\n\n", "chunk_id": "473dac07a3f370f36e2f7eb08c5980b1", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Databricks SQL events\n\nThe following are `databrickssql` events logged at the workspace level. \nNote \nIf you manage your SQL warehouses using the legacy SQL endpoints API, your SQL warehouse audit events will have different action names. See [SQL endpoint logs](https://docs.databricks.com/admin/account-settings/audit-logs.html#endpoints). \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `databrickssql` | `addDashboardWidget` | A widget is added to a dashboard. | * `dashboardId` * `widgetId` |\n| `databrickssql` | `cancelQueryExecution` | A query execution is cancelled from the SQL editor UI. This does not include cancellations that originate from the Query History UI or Databricks SQL Execution API. | * `queryExecutionId` |\n| `databrickssql` | `changeWarehouseAcls` | A warehouse manager updates permissions on a SQL warehouse. | * `aclPermissionSet` * `resourceId` * `shardName` * `targetUserId` |\n| `databrickssql` | `changePermissions` | A user updates permissions on an object. | * `granteeAndPermission` * `objectId` * `objectType` |\n| `databrickssql` | `cloneDashboard` | A user clones a dashboard. | * `dashboardId` |\n| `databrickssql` | `commandSubmit` | Only in verbose audit logs. Generated when a command is submitted to a SQL warehouse, regardless of origin of the request. | * `warehouseId` * `commandId` * `validation` * `commandText` |\n| `databrickssql` | `commandFinish` | Only in verbose audit logs. Generated when a command on a SQL warehouse completes or is canceled, regardless of the origin of the cancellation request. | * `warehouseId` * `commandId` |\n| `databrickssql` | `createAlert` | A user creates an alert. | * `alertId` |\n| `databrickssql` | `createNotificationDestination` | A workspace admin creates a notification destination. | * `notificationDestinationId` * `notificationDestinationType` |\n| `databrickssql` | `createDashboard` | A user creates a dashboard. | * `dashboardId` |\n| `databrickssql` | `createDataPreviewDashboard` | A user creates a data preview dashboard. | * `dashboardId` |\n| `databrickssql` | `createWarehouse` | A user with the cluster create entitlement creates a SQL warehouse. | * `auto_resume` * `auto_stop_mins` * `channel` * `cluster_size` * `conf_pairs` * `custom_cluster_confs` * `enable_databricks_compute` * `enable_photon` * `enable_serverless_compute` * `instance_profile_arn` * `max_num_clusters` * `min_num_clusters` * `name` * `size` * `spot_instance_policy` * `tags` * `test_overrides` |\n| `databrickssql` | `createQuery` | A user creates a query by saving a query draft. | * `queryId` |\n| `databrickssql` | `createQueryDraft` | A user creates a query draft. | * `queryId` |\n| `databrickssql` | `createQuerySnippet` | A user creates a query snippet. | * `querySnippetId` |\n| `databrickssql` | `createSampleDashboard` | A user creates a sample dashboard. | * `sampleDashboardId` |\n| `databrickssql` | `createVisualization` | A user generates a visualization using the SQL editor. Excludes default results tables and visualizations in notebooks that utilize SQL warehouses. | * `queryId` * `visualizationId` |\n| `databrickssql` | `deleteAlert` | A user deletes an alert either from the alert interface or through API. Excludes deletions from the file browser UI. | * `alertId` |\n| `databrickssql` | `deleteNotificationDestination` | A workspace admin deletes a notification destination. | * `notificationDestinationId` |\n| `databrickssql` | `deleteDashboard` | A user deletes a dashboard either from the dashboard interface or through API. Excludes deletion via the file browser UI. | * `dashboardId` |\n| `databrickssql` | `deleteDashboardWidget` | A user deletes a dashboard widget. | * `widgetId` |\n| `databrickssql` | `deleteWarehouse` | A warehouse manager deletes a SQL warehouse. | * `id` |\n| `databrickssql` | `deleteQuery` | A user deletes a query, either from the query interface or through API. Excludes deletion via the file browser UI. | * `queryId` |\n| `databrickssql` | `deleteQueryDraft` | A user deletes a query draft. | * `queryId` |\n| `databrickssql` | `deleteQuerySnippet` | A user deletes a query snippet. | * `querySnippetId` |\n| `databrickssql` | `deleteVisualization` | A user deletes a visualization from a query in the SQL Editor. | * `visualizationId` |\n| `databrickssql` | `downloadQueryResult` | A user downloads a query result from the SQL Editor. Excludes downloads from dashboards. | * `fileType` * `queryId` * `queryResultId` |\n| `databrickssql` | `editWarehouse` | A warehouse manager makes edits to a SQL warehouse. | * `auto_stop_mins` * `channel` * `cluster_size` * `confs` * `enable_photon` * `enable_serverless_compute` * `id` * `instance_profile_arn` * `max_num_clusters` * `min_num_clusters` * `name` * `spot_instance_policy` * `tags` |\n| `databrickssql` | `executeAdhocQuery` | Generated by one of the following:* A user runs a query draft in the SQL editor * A query is executed from a visualization aggregation * A user loads a dashboard and executes underlying queries | * `dataSourceId` |\n| `databrickssql` | `executeSavedQuery` | A user runs a saved query. | * `queryId` |\n| `databrickssql` | `executeWidgetQuery` | Generated by any event that executes a query such that a dashboard panel refreshes. Some examples of applicable events include:* Refreshing a single panel * Refreshing an entire dashboard * Scheduled dashboard executions * Parameter or filter changes operating over more than 64,000 rows | * `widgetId` |\n| `databrickssql` | `favoriteDashboard` | A user favorites a dashboard. | * `dashboardId` |\n| `databrickssql` | `favoriteQuery` | A user favorites a query. | * `queryId` |\n| `databrickssql` | `forkQuery` | A user clones a query. | * `originalQueryId` * `queryId` |\n| `databrickssql` | `listQueries` | A user opens the query listing page or calls the list query API. | * `filter_by` * `include_metrics` * `max_results` * `page_token` |\n| `databrickssql` | `moveDashboardToTrash` | A user moves a dashboard to the trash. | * `dashboardId` |\n| `databrickssql` | `moveQueryToTrash` | A user moves a query to the trash. | * `queryId` |\n| `databrickssql` | `muteAlert` | A user mutes an alert via the API. | * `alertId` |\n| `databrickssql` | `restoreDashboard` | A user restores a dashboard from the trash. | * `dashboardId` |\n| `databrickssql` | `restoreQuery` | A user restores a query from the trash. | * `queryId` |\n| `databrickssql` | `setWarehouseConfig` | A warehouse manager sets the configuration for a SQL warehouse. | * `data_access_config` * `enable_serverless_compute` * `instance_profile_arn` * `security_policy` * `serverless_agreement` * `sql_configuration_parameters` * `try_create_databricks_managed_starter_warehouse` |\n| `databrickssql` | `snapshotDashboard` | A user requests a snapshot of a dashboard. Includes scheduled dashboard snapshots. | * `dashboardId` |\n| `databrickssql` | `startWarehouse` | A SQL warehouse is started. | * `id` |\n| `databrickssql` | `stopWarehouse` | A warehouse manager stops a SQL warehouse. Excludes autostopped warehouses. | * `id` |\n| `databrickssql` | `transferObjectOwnership` | A workspace admin transfers the ownership of a dashboard, query, or alert to an active user. | * `newOwner` * `objectId` * `objectType` |\n| `databrickssql` | `unfavoriteDashboard` | A user removes a dashboard from their favorites. | * `dashboardId` |\n| `databrickssql` | `unfavoriteQuery` | A user removes a query from their favorites. | * `queryId` |\n| `databrickssql` | `unmuteAlert` | A user unmutes an alert via the API | * `alertId`. |\n| `databrickssql` | `updateAlert` | A user makes updates to an alert. | * `alertId` * `queryId` |\n| `databrickssql` | `updateNotificationDestination` | A workspace admin makes an update to a notification destination. | * `notificationDestinationId` |\n| `databrickssql` | `updateDashboardWidget` | A user makes an update to a dashboard widget. Excludes changes to axis scales. Examples of applicable updates include:* Change to widget size or placement * Adding or removing widget parameters | * `widgetId` |\n| `databrickssql` | `updateDashboard` | A user makes an update to a dashboard property. Excludes changes to schedules and subscriptions. Examples of applicable updates include:* Change in dashboard name * Change to the SQL warehouse * Change to **Run As** settings | * `dashboardId` |\n| `databrickssql` | `updateOrganizationSetting` | A workspace admin makes updates to the workspace\u2019s SQL settings. | * `has_configured_data_access` * `has_explored_sql_warehouses` * `has_granted_permissions` |\n| `databrickssql` | `updateQuery` | A user makes an update to a query. | * `queryId` |\n| `databrickssql` | `updateQueryDraft` | A user makes an update to a query draft. | * `queryId` |\n| `databrickssql` | `updateQuerySnippet` | A user makes an update to a query snippet. | * `querySnippetId` |\n| `databrickssql` | `updateVisualization` | A user updates a visualization from either the SQL Editor or the dashboard. | * `visualizationId` |\n\n", "chunk_id": "c7c00faea7a0d8176e7052c31111a93f", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### DBFS events\n\nThe following tables include `dbfs` events logged at the workspace level. \nThere are two types of DBFS events: API calls and operational events. \n### DBFS API events \nThe following DBFS audit events are only logged when written through the [DBFS REST API](https://docs.databricks.com/api/workspace/dbfs). \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `dbfs` | `addBlock` | User appends a block of data to the stream. This is used in conjunction with dbfs/create to stream data to DBFS. | * `handle` * `data_length` |\n| `dbfs` | `create` | User opens a stream to write a file to DBFs. | * `path` * `bufferSize` * `overwrite` |\n| `dbfs` | `delete` | User deletes the file or directory from DBFs. | * `recursive` * `path` |\n| `dbfs` | `mkdirs` | User creates a new DBFS directory. | * `path` |\n| `dbfs` | `move` | User moves a file from one location to another location within DBFs. | * `dst` * `source_path` * `src` * `destination_path` |\n| `dbfs` | `put` | User uploads a file through the use of multipart form post to DBFs. | * `path` * `overwrite` | \n### DBFS operational events \nThe following DBFS audit events occur at the compute plane. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `dbfs` | `mount` | User creates a mount point at a certain DBFS location. | * `mountPoint` * `owner` |\n| `dbfs` | `unmount` | User removes a mount point at a certain DBFS location. | * `mountPoint` |\n\n", "chunk_id": "a439206f0913d2f85ce53a753359fb5a", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Delta pipelines events\n\n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `deltaPipelines` | `changePipelineAcls` | A user changes permissions on a pipeline. | * `shardId` * `targetUserId` * `resourceId` * `aclPermissionSet` |\n| `deltaPipelines` | `create` | A user creates a Delta Live Tables pipeline. | * `allow_duplicate_names` * `clusters` * `configuration` * `continuous` * `development` * `dry_run` * `id` * `libraries` * `name` * `storage` * `target` * `channel` * `edition` * `photon` |\n| `deltaPipelines` | `delete` | A user deletes a Delta Live Tables pipeline. | * `pipeline_id` |\n| `deltaPipelines` | `edit` | A user edits a Delta Live Tables pipeline. | * `allow_duplicate_names` * `clusters` * `configuration` * `continuous` * `development` * `expected_last_modified` * `id` * `libraries` * `name` * `pipeline_id` * `storage` * `target` * `channel` * `edition` * `photon` |\n| `deltaPipelines` | `startUpdate` | A user restarts a Delta Live Tables pipeline. | * `cause` * `full_refresh` * `job_task` * `pipeline_id` |\n| `deltaPipelines` | `stop` | A user stops a Delta Live Tables pipeline. | * `pipeline_id` |\n\n", "chunk_id": "08820e7518cc07ecd30373fd582c97e1", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Feature store events\n\nThe following `featureStore` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `featureStore` | `addConsumer` | A consumer is added to the feature store. | * `features` * `job_run` * `notebook` |\n| `featureStore` | `addDataSources` | A data source is added to a feature table. | * `feature_table` * `paths, tables` |\n| `featureStore` | `addProducer` | A producer is added to a feature table. | * `feature_table` * `job_run` * `notebook` |\n| `featureStore` | `changeFeatureTableAcl` | Permissions are changed in a feature table. | * `aclPermissionSet` * `resourceId` * `shardName` * `targetUserId` |\n| `featureStore` | `createFeatureTable` | A feature table is created. | * `description` * `name` * `partition_keys` * `primary_keys` * `timestamp_keys` |\n| `featureStore` | `createFeatures` | Features are created in a feature table. | * `feature_table` * `features` |\n| `featureStore` | `deleteFeatureTable` | A feature table is deleted. | * `name` |\n| `featureStore` | `deleteTags` | Tags are deleted from a feature table. | * `feature_table_id` * `keys` |\n| `featureStore` | `getConsumers` | A user makes a call to get the consumers in a feature table. | * `feature_table` |\n| `featureStore` | `getFeatureTable` | A user makes a call to get feature tables. | * `name` |\n| `featureStore` | `getFeatureTablesById` | A user makes a call to get feature table IDs. | * `ids` |\n| `featureStore` | `getFeatures` | A user makes a call to get features. | * `feature_table` * `max_results` |\n| `featureStore` | `getModelServingMetadata` | A user makes a call to get Model Serving metadata. | * `feature_table_features` |\n| `featureStore` | `getOnlineStore` | A user makes a call to get online store details. | * `cloud` * `feature_table` * `online_table` * `store_type` |\n| `featureStore` | `getTags` | A user makes a call to get tags for a feature table. | * `feature_table_id` |\n| `featureStore` | `publishFeatureTable` | A feature table is published. | * `cloud` * `feature_table` * `host` * `online_table` * `port` * `read_secret_prefix` * `store_type` * `write_secret_prefix` |\n| `featureStore` | `searchFeatureTables` | A user searches for feature tables. | * `max_results` * `page_token` * `text` |\n| `featureStore` | `setTags` | Tags are added to a feature table. | * `feature_table_id` * `tags` |\n| `featureStore` | `updateFeatureTable` | A feature table is updated. | * `description` * `name` |\n\n", "chunk_id": "011bd6758bef8d1b31647ec95661b7bd", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Files API events\n\nThe following `filesystem` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `filesystem` | `filesGet` | User downloads file. | * `path` * `transferredSize` |\n| `filesystem` | `filesPut` | User uploads file. | * `path` * `receivedSize` |\n| `filesystem` | `filesDelete` | User deletes file. | * `path` |\n| `filesystem` | `filesHead` | User gets information about file. | * `path` |\n\n### Audit log reference\n#### Genie events\n\nThe following `genie` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `genie` | `databricksAccess` | A Databricks personnel is authorized to access a customer environment. | * `duration` * `approver` * `reason` * `authType` * `user` |\n\n", "chunk_id": "e8cafcd997ee72d983841f0ccde4d4c0", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Git credential events\n\nThe following `gitCredentials` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `gitCredentials` | `getGitCredential` | A user gets a git credentials. | * `id` |\n| `gitCredentials` | `listGitCredentials` | A user lists all git credentials | none |\n| `gitCredentials` | `deleteGitCredential` | A user deletes a git credential. | * `id` |\n| `gitCredentials` | `updateGitCredential` | A user updates a git credential. | * `id` * `git_provider` * `git_username` |\n| `gitCredentials` | `createGitCredential` | A user creates a git credential. | * `git_provider` * `git_username` |\n\n### Audit log reference\n#### Global init scripts events\n\nThe following `globalInitScripts` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `globalInitScripts` | `create` | A workspace admin creates a global initialization script. | * `name` * `position` * `script-SHA256` * `enabled` |\n| `globalInitScripts` | `update` | A workspace admin updates a global initialization script. | * `script_id` * `name` * `position` * `script-SHA256` * `enabled` |\n| `globalInitScripts` | `delete` | A workspace admin deletes a global initialization script. | * `script_id` |\n\n", "chunk_id": "a9a5efa817b5798d1bb22a85dbd52d89", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Groups events\n\nThe following `groups` events are logged at the workspace level. These actions are related to legacy ACL groups. For actions related to account- and workspace-level groups, see [Account events](https://docs.databricks.com/admin/account-settings/audit-logs.html#accounts) and [Account-level account events](https://docs.databricks.com/admin/account-settings/audit-logs.html#account-accounts). \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `groups` | `addPrincipalToGroup` | An admin adds a user to a group. | * `user_name` * `parent_name` |\n| `groups` | `createGroup` | An admin creates a group. | * `group_name` |\n| `groups` | `getGroupMembers` | An admin views group members. | * `group_name` |\n| `groups` | `getGroups` | An admin views a list of groups | none |\n| `groups` | `getInheritedGroups` | An admin views inherited groups | none |\n| `groups` | `removeGroup` | An admin removes a group. | * `group_name` |\n\n### Audit log reference\n#### IAM role events\n\nThe following `iamRole` event is logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `iamRole` | `changeIamRoleAcl` | A workspace admin changes permissions for an IAM role. | * `targetUserId` * `shardName` * `resourceId` * `aclPermissionSet` |\n\n", "chunk_id": "8cf40473d946e3e090576800e0f3e9b7", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Ingestion events\n\nThe following `ingestion` event is logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `ingestion` | `proxyFileUpload` | A user uploads a file to their Databricks workspace. | * `x-databricks-content-length-0` * `x-databricks-total-files` |\n\n### Audit log reference\n#### Instance pool events\n\nThe following `instancePools` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `instancePools` | `changeInstancePoolAcl` | A user changes an instance pool\u2019s permissions. | * `shardName` * `resourceId` * `targetUserId` * `aclPermissionSet` |\n| `instancePools` | `create` | A user creates an instance pool. | * `enable_elastic_disk` * `preloaded_spark_versions` * `idle_instance_autotermination_minutes` * `instance_pool_name` * `node_type_id` * `custom_tags` * `max_capacity` * `min_idle_instances` * `aws_attributes` |\n| `instancePools` | `delete` | A user deletes an instance pool. | * `instance_pool_id` |\n| `instancePools` | `edit` | A user edits an instance pool. | * `instance_pool_name` * `idle_instance_autotermination_minutes` * `min_idle_instances` * `preloaded_spark_versions` * `max_capacity` * `enable_elastic_disk` * `node_type_id` * `instance_pool_id` * `aws_attributes` |\n\n", "chunk_id": "b6884cd9ac469ac877b0ffc38aa2b040", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Job events\n\nThe following `jobs` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `jobs` | `cancel` | A job run is cancelled. | * `run_id` |\n| `jobs` | `cancelAllRuns` | A user cancels all runs on a job. | * `job_id` |\n| `jobs` | `changeJobAcl` | A user updates permissions on a job. | * `shardName` * `aclPermissionSet` * `resourceId` * `targetUserId` |\n| `jobs` | `create` | A user creates a job. | * `spark_jar_task` * `email_notifications` * `notebook_task` * `spark_submit_task` * `timeout_seconds` * `libraries` * `name` * `spark_python_task` * `job_type` * `new_cluster` * `existing_cluster_id` * `max_retries` * `schedule` * `run_as` |\n| `jobs` | `delete` | A user deletes a job. | * `job_id` |\n| `jobs` | `deleteRun` | A user deletes a job run. | * `run_id` |\n| `jobs` | `getRunOutput` | A user makes an API call to get a run output. | * `run_id` * `is_from_webapp` |\n| `jobs` | `repairRun` | A user repairs a job run. | * `run_id` * `latest_repair_id` * `rerun_tasks` |\n| `jobs` | `reset` | A job is reset. | * `job_id` * `new_settings` |\n| `jobs` | `resetJobAcl` | A user requests the change of a job\u2019s permissions. | * `grants` * `job_id` |\n| `jobs` | `runCommand` | Available when verbose audit logs are enabled. Emitted after a command in a notebook is executed by a job run. A command corresponds to a cell in a notebook. | * `jobId` * `runId` * `notebookId` * `executionTime` * `status` * `commandId` * `commandText` |\n| `jobs` | `runFailed` | A job run fails. | * `jobClusterType` * `jobTriggerType` * `jobId` * `jobTaskType` * `runId` * `jobTerminalState` * `idInJob` * `orgId` * `runCreatorUserName` |\n| `jobs` | `runNow` | A user triggers an on-demand job run. | * `notebook_params` * `job_id` * `jar_params` * `workflow_context` |\n| `jobs` | `runStart` | Emitted when a job run starts after validation and cluster creation. The request parameters emitted from this event depend on the type of tasks in the job. In addition to the parameters listed, they can include:* `dashboardId` (for a SQL dashboard task) * `filePath` (for a SQL file task) * `notebookPath` (for a notebook task) * `mainClassName` (for a Spark JAR task) * `pythonFile` (for a Spark JAR task) * `projectDirectory` (for a dbt task) * `commands` (for a dbt task) * `packageName` (for a Python wheel task) * `entryPoint` (for a Python wheel task) * `pipelineId` (for a pipeline task) * `queryIds` (for a SQL query task) * `alertId` (for a SQL alert task) | * `taskDependencies` * `multitaskParentRunId` * `orgId` * `idInJob` * `jobId` * `jobTerminalState` * `taskKey` * `jobTriggerType` * `jobTaskType` * `runId` * `runCreatorUserName` |\n| `jobs` | `runSucceeded` | A job run is successful. | * `idInJob` * `jobId` * `jobTriggerType` * `orgId` * `runId` * `jobClusterType` * `jobTaskType` * `jobTerminalState` * `runCreatorUserName` |\n| `jobs` | `runTriggered` | A job schedule is triggered automatically according to its schedule or trigger. | * `jobId` * `jobTriggeredType` * `runId` |\n| `jobs` | `sendRunWebhook` | A webhook is sent either when the job begins, completes, or fails. | * `orgId` * `jobId` * `jobWebhookId` * `jobWebhookEvent` * `runId` |\n| `jobs` | `setTaskValue` | A user sets values for a task. | * `run_id` * `key` |\n| `jobs` | `submitRun` | A user submits a one-time run via the API. | * `shell_command_task` * `run_name` * `spark_python_task` * `existing_cluster_id` * `notebook_task` * `timeout_seconds` * `libraries` * `new_cluster` * `spark_jar_task` |\n| `jobs` | `update` | A user edits a job\u2019s settings. | * `job_id` * `fields_to_remove` * `new_settings` * `is_from_dlt` |\n\n", "chunk_id": "bd3e0d89a9623a28ce453c10f55198c6", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Marketplace consumer events\n\nThe following `marketplaceConsumer` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `marketplaceConsumer` | `getDataProduct` | A user gets access to a data product through the Databricks Marketplace. | * `listing_id` * `listing_name` * `share_name` * `catalog_name` * `request_context`: Array of information about the account and metastore that got access to the data product |\n| `marketplaceConsumer` | `requestDataProduct` | A user requests access to a data product that requires provider approval. | * `listing_id` * `listing_name` * `catalog_name` * `request_context`: Array of information about the account and metastore requesting access to the data product |\n\n", "chunk_id": "73e25eee102621d1c72103faaa3d019c", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Marketplace provider events\n\nThe following `marketplaceProvider` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `marketplaceProvider` | `createListing` | A metastore admin creates a listing in their provider profile. | * `listing`: Array of details about the listing * `request_context`: Array of information about the provider\u2019s account and metastore |\n| `marketplaceProvider` | `updateListing` | A metastore admin makes an update to a listing in their provider profile. | * `id` * `listing`: Array of details about the listing * `request_context`: Array of information about the provider\u2019s account and metastore |\n| `marketplaceProvider` | `deleteListing` | A metastore admin deletes a listing in their provider profile. | * `id` * `request_context`: Array of details about the provider\u2019s account and metastore |\n| `marketplaceProvider` | `updateConsumerRequestStatus` | A metastore admins approves or denies a data product request. | * `listing_id` * `request_id` * `status` * `reason` * `share`: Array of information about the share * `request_context`: Array of information about the provider\u2019s account and metastore |\n| `marketplaceProvider` | `createProviderProfile` | A metastore admin creates a provider profile. | * `provider`: Array of information about the provider * `request_context`: Array of information about the provider\u2019s account and metastore |\n| `marketplaceProvider` | `updateProviderProfile` | A metastore admin makes an update to their provider profile. | * `id` * `provider`: Array of information about the provider * `request_context`: Array of information about the provider\u2019s account and metastore |\n| `marketplaceProvider` | `deleteProviderProfile` | A metastore admin deletes their provider profile. | * `id` * `request_context`: Array of information about the provider\u2019s account and metastore |\n| `marketplaceProvider` | `uploadFile` | A provider uploads a file to their provider profile. | * `request_context`: Array of information about the provider\u2019s account and metastore * `marketplace_file_type` * `display_name` * `mime_type` * `file_parent`: Array of file parent details |\n| `marketplaceProvider` | `deleteFile` | A provider deletes a file from their provider profile. | * `file_id` * `request_context`: Array of information about the provider\u2019s account and metastore |\n\n", "chunk_id": "4a15c80049208068dd248bb57d51b56d", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### MLflow artifacts with ACL events\n\nThe following `mlflowAcledArtifact` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `mlflowAcledArtifact` | `readArtifact` | A user makes call to read an artifact. | * `artifactLocation` * `experimentId` * `runId` |\n| `mlflowAcledArtifact` | `writeArtifact` | A user makes call to write to an artifact. | * `artifactLocation` * `experimentId` * `runId` |\n\n### Audit log reference\n#### MLflow experiment events\n\nThe following `mlflowExperiment` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `mlflowExperiment` | `createMlflowExperiment` | A user creates an MLflow experiment. | * `experimentId` * `path` * `experimentName` |\n| `mlflowExperiment` | `deleteMlflowExperiment` | A user deletes an MLflow experiment. | * `experimentId` * `path` * `experimentName` |\n| `mlflowExperiment` | `moveMlflowExperiment` | A user moves an MLflow experiment. | * `newPath` * `experimentId` * `oldPath` |\n| `mlflowExperiment` | `restoreMlflowExperiment` | A user restores an MLflow experiment. | * `experimentId` * `path` * `experimentName` |\n| `mlflowExperiment` | `renameMlflowExperiment` | A user renames an MLflow experiment. | * `oldName` * `newName` * `experimentId` * `parentPath` |\n\n", "chunk_id": "c010bac9729541d907e1fc50af3800a6", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### MLflow model registry events\n\nThe following `mlflowModelRegistry` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `modelRegistry` | `approveTransitionRequest` | A user approves a model version stage transition request. | * `name` * `version` * `stage` * `archive_existing_versions` |\n| `modelRegistry` | `changeRegisteredModelAcl` | A user updates permissions for a registered model. | * `registeredModelId` * `userId` |\n| `modelRegistry` | `createComment` | A user posts a comment on a model version. | * `name` * `version` |\n| `modelRegistry` | `createModelVersion` | A user creates a model version. | * `name` * `source` * `run_id` * `tags` * `run_link` |\n| `modelRegistry` | `createRegisteredModel` | A user creates a new registered model | * `name` * `tags` |\n| `modelRegistry` | `createRegistryWebhook` | User creates a webhook for Model Registry events. | * `orgId` * `registeredModelId` * `events` * `description` * `status` * `creatorId` * `httpUrlSpec` |\n| `modelRegistry` | `createTransitionRequest` | A user creates a model version stage transition request. | * `name` * `version` * `stage` |\n| `modelRegistry` | `deleteComment` | A user deletes a comment on a model version. | * `id` |\n| `modelRegistry` | `deleteModelVersion` | A user deletes a model version. | * `name` * `version` |\n| `modelRegistry` | `deleteModelVersionTag` | A user deletes a model version tag. | * `name` * `version` * `key` |\n| `modelRegistry` | `deleteRegisteredModel` | A user deletes a registered model | * `name` |\n| `modelRegistry` | `deleteRegisteredModelTag` | A user deletes the tag for a registered model. | * `name` * `key` |\n| `modelRegistry` | `deleteRegistryWebhook` | User deletes a Model Registry webhook. | * `orgId` * `webhookId` |\n| `modelRegistry` | `deleteTransitionRequest` | A user cancels a model version stage transition request. | * `name` * `version` * `stage` * `creator` |\n| `modelRegistry` | `finishCreateModelVersionAsync` | Completed asynchronous model copying. | * `name` * `version` |\n| `modelRegistry` | `generateBatchInferenceNotebook` | Batch inference notebook is autogenerated. | * `userId` * `orgId` * `modelName` * `inputTableOpt` * `outputTablePathOpt` * `stageOrVersion` * `modelVersionEntityOpt` * `notebookPath` |\n| `modelRegistry` | `generateDltInferenceNotebook` | Inference notebook for a Delta Live Tables pipeline is autogenerated. | * `userId` * `orgId` * `modelName` * `inputTable` * `outputTable` * `stageOrVersion` * `notebookPath` |\n| `modelRegistry` | `getModelVersionDownloadUri` | A user gets a URI to download the model version. | * `name` * `version` |\n| `modelRegistry` | `getModelVersionSignedDownloadUri` | A user gets a URI to download a signed model version. | * `name` * `version` * `path` |\n| `modelRegistry` | `listModelArtifacts` | A user makes a call to list a model\u2019s artifacts. | * `name` * `version` * `path` * `page_token` |\n| `modelRegistry` | `listRegistryWebhooks` | A user makes a call to list all registry webhooks in the model. | * `orgId` * `registeredModelId` |\n| `modelRegistry` | `rejectTransitionRequest` | A user rejects a model version stage transition request. | * `name` * `version` * `stage` |\n| `modelRegistry` | `renameRegisteredModel` | A user renames a registered model | * `name` * `new_name` |\n| `modelRegistry` | `setEmailSubscriptionStatus` | A user updates the email subscription status for a registered model | |\n| `modelRegistry` | `setModelVersionTag` | A user sets a model version tag. | * `name` * `version` * `key` * `value` |\n| `modelRegistry` | `setRegisteredModelTag` | A user sets a model version tag. | * `name` * `key` * `value` |\n| `modelRegistry` | `setUserLevelEmailSubscriptionStatus` | A user updates their email notifications status for the whole registry. | * `orgId` * `userId` * `subscriptionStatus` |\n| `modelRegistry` | `testRegistryWebhook` | A user tests the Model Registry webhook. | * `orgId` * `webhookId` |\n| `modelRegistry` | `transitionModelVersionStage` | A user gets a list of all open stage transition requests for the model version. | * `name` * `version` * `stage` * `archive_existing_versions` |\n| `modelRegistry` | `triggerRegistryWebhook` | A Model Registry webhook is triggered by an event. | * `orgId` * `registeredModelId` * `events` * `status` |\n| `modelRegistry` | `updateComment` | A user post an edit to a comment on a model version. | * `id` |\n| `modelRegistry` | `updateRegistryWebhook` | A user updates a Model Registry webhook. | * `orgId` * `webhookId` |\n\n", "chunk_id": "0c0d9c542e937e5c8f7ed7f2301c58e0", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Model serving events\n\nThe following `serverlessRealTimeInference` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `serverlessRealTimeInference` | `changeInferenceEndpointAcl` | User updates permissions for an inference endpoint. | * `shardName` * `targetUserId` * `resourceId` * `aclPermissionSet` |\n| `serverlessRealTimeInference` | `createServingEndpoint` | User creates a model serving endpoint. | * `name` * `config` |\n| `serverlessRealTimeInference` | `deleteServingEndpoint` | User deletes a model serving endpoint. | * `name` |\n| `serverlessRealTimeInference` | `disable` | User disables model serving for a registered model. | * `registered_mode_name` |\n| `serverlessRealTimeInference` | `enable` | User enables model serving for a registered model. | * `registered_mode_name` |\n| `serverlessRealTimeInference` | `getQuerySchemaPreview` | Users makes a call to get the query schema preview. | * `endpoint_name` |\n| `serverlessRealTimeInference` | `updateServingEndpoint` | User updates a model serving endpoint. | * `name` * `served_models` * `traffic_config` |\n\n", "chunk_id": "0fa8eece03e141455f62613abace1af4", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Notebook events\n\nThe following `notebook` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `notebook` | `attachNotebook` | A notebook is attached to a cluster. | * `path` * `clusterId` * `notebookId` |\n| `notebook` | `cloneNotebook` | A user clones a notebook. | * `notebookId` * `path` * `clonedNotebookId` * `destinationPath` |\n| `notebook` | `createNotebook` | A notebook is created. | * `notebookId` * `path` |\n| `notebook` | `deleteFolder` | A notebook folder is deleted. | * `path` |\n| `notebook` | `deleteNotebook` | A notebook is deleted. | * `notebookId` * `notebookName` * `path` |\n| `notebook` | `detachNotebook` | A notebook is detached from a cluster. | * `notebookId` * `clusterId` * `path` |\n| `notebook` | `downloadLargeResults` | A user downloads query results too large to display in the notebook. | * `notebookId` * `notebookFullPath` |\n| `notebook` | `downloadPreviewResults` | A user downloads the query results. | * `notebookId` * `notebookFullPath` |\n| `notebook` | `importNotebook` | A user imports a notebook. | * `path` |\n| `notebook` | `moveFolder` | A notebook folder is moved from one location to another. | * `oldPath` * `newPath` * `folderId` |\n| `notebook` | `moveNotebook` | A notebook is moved from one location to another. | * `newPath` * `oldPath` * `notebookId` |\n| `notebook` | `renameNotebook` | A notebook is renamed. | * `newName` * `oldName` * `parentPath` * `notebookId` |\n| `notebook` | `restoreFolder` | A deleted folder is restored. | * `path` |\n| `notebook` | `restoreNotebook` | A deleted notebook is restored. | * `path` * `notebookId` * `notebookName` |\n| `notebook` | `runCommand` | Available when verbose audit logs are enabled. Emitted after Databricks runs a command in a notebook. A command corresponds to a cell in a notebook. `executionTime` is measured in seconds. | * `notebookId` * `executionTime` * `status` * `commandId` * `commandText` * `commandLanguage` |\n| `notebook` | `takeNotebookSnapshot` | Notebook snapshots are taken when either the job service or mlflow is run. | * `path` |\n\n", "chunk_id": "7b326a8b34f922b2776409444857bc7d", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Partner Connect events\n\nThe following `partnerHub` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `partnerHub` | `createOrReusePartnerConnection` | A workspace admin sets up a connection to a partner solution. | * `partner_name` |\n| `partnerHub` | `deletePartnerConnection` | A workspace admin deletes a partner connection. | * `partner_name` |\n| `partnerHub` | `downloadPartnerConnectionFile` | A workspace admin downloads the partner connection file. | * `partner_name` |\n| `partnerHub` | `setupResourcesForPartnerConnection` | A workspace admin sets up resources for a partner connection. | * `partner_name` |\n\n### Audit log reference\n#### Remote history service events\n\nThe following `remoteHistoryService` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `remoteHistoryService` | `addUserGitHubCredentials` | User adds Github Credentials | none |\n| `remoteHistoryService` | `deleteUserGitHubCredentials` | User removes Github Credentials | none |\n| `remoteHistoryService` | `updateUserGitHubCredentials` | User updates Github Credentials | none |\n\n", "chunk_id": "386fc90a624318c5eabeae29b4615347", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Git folder events\n\nThe following `repos` events are logged at the workspace level. \n| Service | Action name | Description | Request parameters |\n| --- | --- | --- | --- |\n| `repos` | `checkoutBranch` | A user checks out a branch on the repo. | * `id` * `branch` |\n| `repos` | `commitAndPush` | A user commits and pushes to a repo. | * `id` * `message` * `files` * `checkSensitiveToken` |\n| `repos` | `createRepo` | A user creates a repo in the workspace | * `url` * `provider` * `path` |\n| `repos` | `deleteRepo` | A user deletes a repo. | * `id` |\n| `repos` | `discard` | A user discards a commit to a repo. | * `id` * `file_paths` |\n| `repos` | `getRepo` | A user makes a call to get information about a single repo. | * `id` |\n| `repos` | `listRepos` | A user makes a call to get all repos they have Manage permissions on. | * `path_prefix` * `next_page_token` |\n| `repos` | `pull` | A user pulls the latest commits from a repo. | * `id` |\n| `repos` | `updateRepo` | A user updates the repo to a different branch or tag, or to the latest commit on the same branch. | * `id` * `branch` * `tag` * `git_url` * `git_provider` |\n\n", "chunk_id": "29d674399a1d3cb54f92806a9d12fbc7", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Secrets events\n\nThe following `secrets` events are logged at the workspace level. \n| Service | Action name | Description | Request parameters |\n| --- | --- | --- | --- |\n| `secrets` | `createScope` | User creates a secret scope. | * `scope` * `initial_manage_principal` * `scope_backend_type` |\n| `secrets` | `deleteAcl` | User deletes ACLs for a secret scope. | * `scope` * `principal` |\n| `secrets` | `deleteScope` | User deletes a secret scope. | * `scope` |\n| `secrets` | `deleteSecret` | User deletes a secret from a scope. | * `key` * `scope` |\n| `secrets` | `getAcl` | User gets ACLs for a secret scope. | * `scope` * `principal` |\n| `secrets` | `getSecret` | User gets a secret from a scope. | * `key` * `scope` |\n| `secrets` | `listAcls` | User makes a call to list ACLs for a secret scope. | * `scope` |\n| `secrets` | `listScopes` | User makes a call to list secret scopes | none |\n| `secrets` | `listSecrets` | User makes a call to list secrets within a scope. | * `scope` |\n| `secrets` | `putAcl` | User changes ACLs for a secret scope. | * `scope` * `principal` * `permission` |\n| `secrets` | `putSecret` | User adds or edits a secret within a scope. | * `string_value` * `key` * `scope` |\n\n", "chunk_id": "944ef4e873abf7255985d8188444a2aa", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### SQL table access events\n\nNote \nThe `sqlPermissions` service includes events related to the legacy Hive metastore table access control. Databricks recommends that you [upgrade the tables managed by the Hive metastore to the Unity Catalog metastore](https://docs.databricks.com/data-governance/unity-catalog/migrate.html). \nThe following `sqlPermissions` events are logged at the workspace level. \n| Service | Action name | Description | Request parameters |\n| --- | --- | --- | --- |\n| `sqlPermissions` | `changeSecurableOwner` | Workspace admin or owner of an object transfers object ownership. | * `securable` * `principal` |\n| `sqlPermissions` | `createSecurable` | User creates a securable object. | * `securable` |\n| `sqlPermissions` | `denyPermission` | Object owner denies privileges on a securable object. | * `permission` |\n| `sqlPermissions` | `grantPermission` | Object owner grants permission on a securable object. | * `permission` |\n| `sqlPermissions` | `removeAllPermissions` | User drops a securable object. | * `securable` |\n| `sqlPermissions` | `renameSecurable` | User renames a securable object. | * `before` * `after` |\n| `sqlPermissions` | `requestPermissions` | User requests permissions on a securable object. | * `requests` |\n| `sqlPermissions` | `revokePermission` | Object owner revokes permissions on their securable object. | * `permission` |\n| `sqlPermissions` | `showPermissions` | User views securable object permissions. | * `securable` * `principal` |\n\n", "chunk_id": "2d35532f575520886f5899995d7122a2", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### SSH events\n\nThe following `ssh` events are logged at the workspace level. \n| Service | Action name | Description | Request parameters |\n| --- | --- | --- | --- |\n| `ssh` | `login` | Agent login of SSH into Spark driver. | * `containerId` * `userName` * `port` * `publicKey` * `instanceId` |\n| `ssh` | `logout` | Agent logout of SSH from Spark driver. | * `userName` * `containerId` * `instanceId` |\n\n### Audit log reference\n#### Vector search events\n\nThe following `vectorSearch` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `vectorSearch` | `createEndpoint` | User creates a vector search endpoint. | * `name` * `endpoint_type` |\n| `vectorSearch` | `deleteEndpoint` | User deletes a vector search endpoint. | * `name` |\n| `vectorSearch` | `createVectorIndex` | User creates a vector search index. | * `name` * `endpoint_name` * `primary_key` * `index_type` * `delta_sync_index_spec` * `direct_access_index_spec` |\n| `vectorSearch` | `deleteVectorIndex` | User deletes a vector search index. | * `name` * `endpoint_name` * `delete_embedding_writeback_table` |\n\n", "chunk_id": "308c22aa75f1ab7f02d0b772595cc2ca", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Web terminal events\n\nThe following `webTerminal` events are logged at the workspace level. \n| Service | Action name | Description | Request parameters |\n| --- | --- | --- | --- |\n| `webTerminal` | `startSession` | User starts a web terminal sessions. | * `socketGUID` * `clusterId` * `serverPort` * `ProxyTargetURI` |\n| `webTerminal` | `closeSession` | User closes a web terminal session. | * `socketGUID` * `clusterId` * `serverPort` * `ProxyTargetURI` |\n\n", "chunk_id": "d69835b7041fcb71396085cb18cfd67e", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Workspace events\n\nThe following `workspace` events are logged at the workspace level. \n| Service | Action name | Description | Request parameters |\n| --- | --- | --- | --- |\n| `workspace` | `changeWorkspaceAcl` | Permissions to the workspace are changed. | * `shardName` * `targetUserId` * `aclPermissionSet` * `resourceId` |\n| `workspace` | `deleteSetting` | A setting is deleted from the workspace. | * `settingKeyTypeName` * `settingKeyName` * `settingTypeName` * `settingName` |\n| `workspace` | `fileCreate` | User creates a file in the workspace. | * `path` |\n| `workspace` | `fileDelete` | User deletes a file in the workspace. | * `path` |\n| `workspace` | `fileEditorOpenEvent` | User opens the file editor. | * `notebookId` * `path` |\n| `workspace` | `getRoleAssignment` | User gets a workspace\u2019s user roles. | * `account_id` * `workspace_id` |\n| `workspace` | `mintOAuthAuthorizationCode` | Recorded when in-house OAuth authorization code is minted at the workspace level. | * `client_id` |\n| `workspace` | `mintOAuthToken` | OAuth token is minted for workspace. | * `grant_type` * `scope` * `expires_in` * `client_id` |\n| `workspace` | `moveWorkspaceNode` | A workspace admin moves workspace node. | * `destinationPath` * `path` |\n| `workspace` | `purgeWorkspaceNodes` | A workspace admin purges workspace nodes. | * `treestoreId` |\n| `workspace` | `reattachHomeFolder` | An existing home folder is re-attached for a user that is re-added to the workspace. | * `path` |\n| `workspace` | `renameWorkspaceNode` | A workspace admin renames workspace nodes. | * `path` * `destinationPath` |\n| `workspace` | `unmarkHomeFolder` | Home folder special attributes are removed when a user is removed from the workspace. | * `path` |\n| `workspace` | `updateRoleAssignment` | A workspace admin updates a workspace user\u2019s role. | * `account_id` * `workspace_id` * `principal_id` |\n| `workspace` | `setSetting` | A workspace admin configures a workspace setting. | * `settingKeyTypeName` * `settingKeyName` * `settingTypeName` * `settingName` * `settingValueForAudit` |\n| `workspace` | `workspaceConfEdit` | Workspace admin makes updates to a setting, for example enabling verbose audit logs. | * `workspaceConfKeys` * `workspaceConfValues` |\n| `workspace` | `workspaceExport` | User exports a notebook from a workspace. | * `workspaceExportDirectDownload` * `workspaceExportFormat` * `notebookFullPath` |\n| `workspace` | `workspaceInHouseOAuthClientAuthentication` | OAuth client is authenticated in workspace service. | * `user` |\n\n", "chunk_id": "e08c90b5ed88c05c0cbc982e12f0ff0b", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Billable usage events\n\nThe following `accountBillableUsage` events are logged at the account level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `accountBillableUsage` | `getAggregatedUsage` | User accessed aggregated billable usage (usage per day) for the account via the Usage Graph feature. | * `account_id` * `window_size` * `start_time` * `end_time` * `meter_name` * `workspace_ids_filter` |\n| `accountBillableUsage` | `getDetailedUsage` | User accessed detailed billable usage (usage for each cluster) for the account via the Usage Download feature. | * `account_id` * `start_month` * `end_month` * `with_pii` |\n\n", "chunk_id": "cb2cc73b2df7dd09f1876ec554816a8a", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Account-level account events\n\nThe following `accounts` events are logged at the account level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `accounts` | `accountInHouseOAuthClientAuthentication` | An OAuth client is authenticated. | * `endpoint` |\n| `accounts` | `accountIpAclsValidationFailed` | IP permissions validation fails. Returns statusCode 403. | * `sourceIpAddress` * `user`: logged as an email address |\n| `accounts` | `activateUser` | A user is reactivated after being deactivated. See [Deactivate users in account](https://docs.databricks.com/admin/users-groups/users.html#deactivate-user). | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `add` | A user is added to the Databricks account. | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `addPrincipalToGroup` | A user is added to an account-level group. | * `targetGroupId` * `endpoint` * `targetUserId` * `targetGroupName` * `targetUserName` |\n| `accounts` | `addPrincipalsToGroup` | Users are added to an account-level group using SCIM provisioning. | * `targetGroupId` * `endpoint` * `targetUserId` * `targetGroupName` * `targetUserName` |\n| `accounts` | `createGroup` | An account-level group is created. | * `endpoint` * `targetGroupId` * `targetGroupName` |\n| `accounts` | `deactivateUser` | A user is deactivated. See [Deactivate users in account](https://docs.databricks.com/admin/users-groups/users.html#deactivate-user). | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `delete` | A user is deleted from the Databricks account. | * `targetUserId` * `targetUserName` * `endpoint` |\n| `accounts` | `deleteSetting` | Account admin removes a setting from the Databricks account. | * `settingKeyTypeName` * `settingKeyName` * `settingTypeName` * `settingName` * `settingValueForAudit` |\n| `accounts` | `garbageCollectDbToken` | A user runs a garbage collect command on expired tokens. | * `tokenExpirationTime` * `tokenClientId` * `userId` * `tokenCreationTime` * `tokenFirstAccessed` |\n| `accounts` | `generateDbToken` | User generates a token from User Settings or when the service generates the token. | * `tokenExpirationTime` * `tokenCreatedBy` * `tokenHash` * `userId` |\n| `accounts` | `login` | A user logs into the account console. | * `user` |\n| `accounts` | `logout` | A user logs out of the account console. | * `user` |\n| `accounts` | `mintOAuthAuthorizationCode` | Recorded when in-house OAuth authorization code is minted at the account level. | * `client_id` |\n| `accounts` | `mintOAuthToken` | An account-level OAuth token is issued to the service principal. | * `user` |\n| `accounts` | `oidcBrowserLogin` | A user logs into their account with the OpenID Connect browser workflow. | * `user` |\n| `accounts` | `oidcTokenAuthorization` | An OIDC token is authenticated for an account admin login. | * `user` |\n| `accounts` | `passwordVerifyAuthentication` | A user\u2019s password is verified during account console login. | * `user` |\n| `accounts` | `removeAccountAdmin` | An account admin removes account admin permissions from another user. | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `removeGroup` | A group is removed from the account. | * `targetGroupId` * `targetGroupName` * `endpoint` |\n| `accounts` | `removePrincipalFromGroup` | A user is removed from an account-level group. | * `targetGroupId` * `endpoint` * `targetUserId` * `targetGroupName` * `targetUserName` |\n| `accounts` | `removePrincipalsFromGroup` | Users are removed from an account-level group using SCIM provisioning. | * `targetGroupId` * `endpoint` * `targetUserId` * `targetGroupName` * `targetUserName` |\n| `accounts` | `setAccountAdmin` | An account admin assigns the account admin role to another user. | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `setSetting` | An account admin updates an account-level setting. | * `settingKeyTypeName` * `settingKeyName` * `settingTypeName` * `settingName` * `settingValueForAudit` |\n| `accounts` | `tokenLogin` | A user logs into Databricks using a token. | * `tokenId` * `user` |\n| `accounts` | `updateUser` | An account admin updates a user account. | * `targetUserName` * `endpoint` * `targetUserId` |\n| `accounts` | `updateGroup` | An account admin updates an account-level group. | * `endpoint` * `targetGroupId` * `targetGroupName` |\n| `accounts` | `validateEmail` | When a user validates their email after account creation. | * `endpoint` * `targetUserName` * `targetUserId` |\n\n", "chunk_id": "85d394c627fe177bfa42a32bf8d93613", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Account-level access control events\n\nThe following `accountsAccessControl` event is logged at the account level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `accountsAccessControl` | `updateRuleSet` | When a rule set is changed. | * `account_id` * `name` * `rule_set` |\n\n", "chunk_id": "44fc65f5089be60479178eecc8cd556f", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Account management events\n\nThe following `accountsManager` events are logged at the account level. These events have to do with configurations made by account admins in the account console. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `accountsManager` | `acceptTos` | Admin accepts a workspace\u2019s terms of service. | * `workspace_id` |\n| `accountsManager` | `accountUserResetPassword` | Account admin resets a users password. Also logs whether the user changed the password after the reset. | * `wasPasswordChanged` * `serviceSource` * `targetUserId` * `userId` * `newPasswordSource` |\n| `accountsManager` | `changeAccountOwner` | Account owner role is transferred to another account admin. | * `account_id` * `first_name` * `last_name` * `email` |\n| `accountsManager` | `consolidateAccounts` | The account was consolidated with another account by Databricks. | * `target_account_id` * `account_ids_to_consolidate` |\n| `accountsManager` | `createCredentialsConfiguration` | Account admin created a credentials configuration. | * `credentials` |\n| `accountsManager` | `createCustomerManagedKeyConfiguration` | Account admin created a customer-managed key configuration. | * `customer_managed_key` |\n| `accountsManager` | `createNetworkConfiguration` | Account admin created a network configuration. | * `network` |\n| `accountsManager` | `createPrivateAccessSettings` | Account admin created a private access settings configuration. | * `private_access_settings` |\n| `accountsManager` | `createStorageConfiguration` | Account admin created a storage configuration. | * `storage_configuration` |\n| `accountsManager` | `createVpcEndpoint` | Account admin created a VPC endpoint configuration. | * `vpc_endpoint` |\n| `accountsManager` | `createWorkspaceConfiguration` | Account admin creates a new workspace. The `workspace` request parameter is an array of deployment information including `workspace_name`. You can find the `workspace_id` in the `response.result` parameter. | * `workspace` |\n| `accountsManager` | `deleteCredentialsConfiguration` | Account admin deleted a credentials configuration. | * `account_id` * `credentials_id` |\n| `accountsManager` | `deleteCustomerManagedKeyConfiguration` | Account admin deleted a customer-managed key configuration. | * `account_id` * `customer_managed_key_id` |\n| `accountsManager` | `deleteNetworkConfiguration` | Account admin deleted a network configuration. | * `account_id` * `network_id` |\n| `accountsManager` | `deletePrivateAccessSettings` | Account admin deleted a private access settings configuration. | * `account_id` * `private_access_settings_id` |\n| `accountsManager` | `deleteStorageConfiguration` | Account admin deleted a storage configuration. | * `account_id` * `storage_configuration_id` |\n| `accountsManager` | `deleteVpcEndpoint` | Account admin deleted a VPC endpoint configuration. | * `account_id` * `vpc_endpoint_id` |\n| `accountsManager` | `deleteWorkspaceConfiguration` | Account admin deleted a workspace. | * `account_id` * `workspace_id` |\n| `accountsManager` | `getCredentialsConfiguration` | Account admin requests details about a credentials configuration. | * `account_id` * `credentials_id` |\n| `accountsManager` | `getCustomerManagedKeyConfiguration` | Account admin requests details about a customer-managed key configuration. | * `account_id` * `customer_managed_key_id` |\n| `accountsManager` | `getNetworkConfiguration` | Account admin requests details about a network configuration. | * `account_id` * `network_id` |\n| `accountsManager` | `getPrivateAccessSettings` | Account admin requests details about a private access settings configuration. | * `account_id` * `private_access_settings_id` |\n| `accountsManager` | `getStorageConfiguration` | Account admin requests details about a storage configuration. | * `account_id` * `storage_configuration_id` |\n| `accountsManager` | `getVpcEndpoint` | Account admin requests details about a VPC endpoint configuration. | * `account_id` * `vpc_endpoint_id` |\n| `accountsManager` | `getWorkspaceConfiguration` | Account admin requests details about a workspace. | * `account_id` * `workspace_id` |\n| `accountsManager` | `listCredentialsConfigurations` | Account admin lists all credentials configurations in the account. | * `account_id` |\n| `accountsManager` | `listCustomerManagedKeyConfigurations` | Account admin lists all customer-managed key configurations in the account. | * `account_id` |\n| `accountsManager` | `listNetworkConfigurations` | Account admin lists all network configurations in the account. | * `account_id` |\n| `accountsManager` | `listPrivateAccessSettings` | Account admin lists all private access settings configurations in the account. | * `account_id` |\n| `accountsManager` | `listStorageConfigurations` | Account admin lists all storage configurations in the account. | * `account_id` |\n| `accountsManager` | `listSubscriptions` | Account admin lists all account billing subscriptions. | * `account_id` |\n| `accountsManager` | `listVpcEndpoints` | Account admin listed all VPC endpoint configurations for the account. | * `account_id` |\n| `accountsManager` | `listWorkspaceConfigurations` | Account admin lists all workspace in the account. | * `account_id` |\n| `accountsManager` | `listWorkspaceEncryptionKeyRecords` | Account admin lists all encryption key records in a specific workspace. | * `account_id` * `workspace_id` |\n| `accountsManager` | `listWorkspaceEncryptionKeyRecordsForAccount` | Account admin lists all encryption key records in the account. | * `account_id` |\n| `accountsManager` | `sendTos` | An email was sent to a workspace admin to accept the Databricks Terms of Service. | * `account_id` * `workspace_id` |\n| `accountsManager` | `updateAccount` | The account details were changed internally. | * `account_id` * `account` |\n| `accountsManager` | `updateSubscription` | The account billing subscriptions were updated. | * `account_id` * `subscription_id` * `subscription` |\n| `accountsManager` | `updateWorkspaceConfiguration` | Admin updated the configuration for a workspace. | * `account_id` * `workspace_id` |\n\n", "chunk_id": "897820272781194647cd035b634e8df5", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Log delivery events\n\nThe following `logDelivery` events are logged at the account level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `logDelivery` | `createLogDeliveryConfiguration` | Admin created a log delivery configuration. | * `account_id` * `config_id` |\n| `logDelivery` | `getLogDeliveryConfiguration` | Admin requested details about a log delivery configuration. | * `log_delivery_configuration` |\n| `logDelivery` | `listLogDeliveryConfigurations` | Admin listed all log delivery configurations in the account. | * `account_id` * `storage_configuration_id` * `credentials_id` * `status` |\n| `logDelivery` | `updateLogDeliveryConfiguration` | Admin updated a log delivery configuration. | * `config_id` * `account_id` * `status` |\n\n", "chunk_id": "a264dccea99f833109507fe5643c289f", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Oauth SSO events\n\nThe following `oauth2` events are logged at the account level and are related to OAuth SSO authentication to the account console. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `oauth2` | `createCustomAppIntegration` | A workspace admin creates custom app integration. | * `redirect_url` * `name` * `token_access_policy` * `confidential` |\n| `oauth2` | `createPublishedAppIntegration` | A workspace admin creates an app integration using a published app integration. | * `app_id` |\n| `oauth2` | `deleteCustomAppIntegration` | A workspace admin deletes custom app integration. | * `integration_id` |\n| `oauth2` | `deletePublishedAppIntegration` | A workspace admin deletes published app integration. | * `integration_id` |\n| `oauth2` | `enrollOAuth` | A workspace admin enrolls account in OAuth. | * `enable_all_published_apps` |\n| `oauth2` | `updateCustomAppIntegration` | A workspace admin updates custom app integration. | * `redirect_url` * `name` * `token_access_policy` * `confidential` |\n| `oauth2` | `updatePublishedAppIntegration` | A workspace admin updates published app integration. | * `token_access_policy` |\n\n", "chunk_id": "ce955068c1d1778048df93514c4f9aa1", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Service principal credentials events (Public Preview)\n\nThe following `servicePrincipalCredentials` events are logged at the account level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `servicePrincipalCredentials` | `create` | Account admin generates an OAuth secret for the service principal. | * `account_id` * `service_principal` * `secret_id` |\n| `servicePrincipalCredentials` | `list` | Account admin lists all OAuth secrets under a service principal. | * `account_id` * `service_principal` |\n| `servicePrincipalCredentials` | `delete` | Account admin deletes a service principal\u2019s OAuth secret. | * `account_id` * `service_principal` * `secret_id` |\n\n### Audit log reference\n#### Single-sign on events\n\nThe following `ssoConfigBackend` events are logged at the account level and are related to SSO authentication for the account console. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `ssoConfigBackend` | `create` | Account admin created an account console SSO configuration. | * `account_id` * `sso_type` * `config` |\n| `ssoConfigBackend` | `get` | Account admin requested details about an account console SSO configuration. | * `account_id` * `sso_type` |\n| `ssoConfigBackend` | `update` | Account admin updated an account console SSO configuration. | * `account_id` * `sso_type` * `config` |\n\n", "chunk_id": "607281e5f155b33f2b52edb1e59e2237", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Unity Catalog events\n\nThe following audit events are related to Unity Catalog. Delta Sharing events are also logged under the `unityCatalog` service. For Delta Sharing events, see [Delta Sharing events](https://docs.databricks.com/admin/account-settings/audit-logs.html#ds). Unity Catalog audit events can be logged at the workspace level or account level depending on the event. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `unityCatalog` | `createMetastore` | Account admin creates a metastore. | * `name` * `storage_root` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getMetastore` | Account admin requests metastore ID. | * `id` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getMetastoreSummary` | Account admin requests details about a metastore. | * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listMetastores` | Account admin requests a list of all metastores in an account. | * `workspace_id` |\n| `unityCatalog` | `updateMetastore` | Account admin makes an update to a metastore. | * `id` * `owner` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteMetastore` | Account admin deletes a metastore. | * `id` * `force` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateMetastoreAssignment` | Account admin makes an update to a metastore\u2019s workspace assignment. | * `workspace_id` * `metastore_id` * `default_catalog_name` |\n| `unityCatalog` | `createExternalLocation` | Account admin creates an external location. | * `name` * `skip_validation` * `url` * `credential_name` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getExternalLocation` | Account admin requests details about an external location. | * `name_arg` * `include_browse` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listExternalLocations` | Account admin request list of all external locations in an account. | * `url` * `max_results` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateExternalLocation` | Account admin makes an update to an external location. | * `name_arg` * `owner` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteExternalLocation` | Account admin deletes an external location. | * `name_arg` * `force` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `createCatalog` | User creates a catalog. | * `name` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteCatalog` | User deletes a catalog. | * `name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getCatalog` | User requests details about a catalog. | * `name_arg` * `dependent` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateCatalog` | User updates a catalog. | * `name_arg` * `isolation_mode` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listCatalog` | User makes a call to list all catalogs in the metastore. | * `name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `createSchema` | User creates a schema. | * `name` * `catalog_name` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteSchema` | User deletes a schema. | * `full_name_arg` * `force` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getSchema` | User requests details about a schema. | * `full_name_arg` * `dependent` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listSchema` | User requests a list of all schemas in a catalog. | * `catalog_name` |\n| `unityCatalog` | `updateSchema` | User updates a schema. | * `full_name_arg` * `name` * `workspace_id` * `metastore_id` * `comment` |\n| `unityCatalog` | `createStagingTable` | | * `name` * `catalog_name` * `schema_name` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `createTable` | User creates a table. The request parameters differ depending on the type of table created. | * `name` * `data_source_format` * `catalog_name` * `schema_name` * `storage_location` * `columns` * `dry_run` * `table_type` * `view_dependencies` * `view_definition` * `sql_path` * `comment` |\n| `unityCatalog` | `deleteTable` | User deletes a table. | * `full_name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getTable` | User requests details about a table. | * `include_delta_metadata` * `full_name_arg` * `dependent` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `privilegedGetTable` | | * `full_name_arg` |\n| `unityCatalog` | `listTables` | User makes a call to list all tables in a schema. | * `catalog_name` * `schema_name` * `workspace_id` * `metastore_id` * `include_browse` |\n| `unityCatalog` | `listTableSummaries` | User gets an array of summaries for tables for a schema and catalog within the metastore. | * `catalog_name` * `schema_name_pattern` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateTables` | User makes an update to a table. The request parameters displayed vary depending on the type of table updates made. | * `full_name_arg` * `table_type` * `table_constraint_list` * `data_source_format` * `columns` * `dependent` * `row_filter` * `storage_location` * `sql_path` * `view_definition` * `view_dependencies` * `owner` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `createStorageCredential` | Account admin creates a storage credential. You might see an additional request parameter based on your cloud provider credentials. | * `name` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listStorageCredentials` | Account admin makes a call to list all storage credentials in the account. | * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getStorageCredential` | Account admin requests details about a storage credential. | * `name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateStorageCredential` | Account admin makes an update to a storage credential. | * `name_arg` * `owner` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteStorageCredential` | Account admin deletes a storage credential. | * `name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `generateTemporaryTableCredential` | Logged whenever a temporary credential is granted for a table. You can use this event to determine who queried what and when. | * `credential_id` * `credential_type` * `is_permissions_enforcing_client` * `table_full_name` * `operation` * `table_id` * `workspace_id` * `table_url` * `metastore_id` |\n| `unityCatalog` | `generateTemporaryPathCredential` | Logged whenever a temporary credential is granted for a path. | * `url` * `operation` * `make_path_only_parent` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getPermissions` | User makes a call to get permission details for a securable object. This call doesn\u2019t return inherited permissions, only explicitly assigned permissions. | * `securable_type` * `securable_full_name` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getEffectivePermissions` | User makes a call to get all permission details for a securable object. An effective permissions call returns both explicitly assigned and inherited permissions. | * `securable_type` * `securable_full_name` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updatePermissions` | User updates permissions on a securable object. | * `securable_type` * `changes` * `securable_full_name` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `metadataSnapshot` | User queries the metadata from a previous table version. | * `securables` * `include_delta_metadata` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `metadataAndPermissionsSnapshot` | User queries the metadata and permissions from a previous table version. | * `securables` * `include_delta_metadata` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateMetadataSnapshot` | User updates the metadata from a previous table version. | * `table_list_snapshots` * `schema_list_snapshots` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getForeignCredentials` | User makes a call to get details about a foreign key. | * `securables` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getInformationSchema` | User makes a call to get details about a schema. | * `table_name` * `page_token` * `required_column_names` * `row_set_type` * `required_column_names` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `createConstraint` | User creates a constraint for a table. | * `full_name_arg` * `constraint` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteConstraint` | User deletes a constraint for a table. | * `full_name_arg` * `constraint` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `createPipeline` | User creates a Unity Catalog pipeline. | * `target_catalog_name` * `has_workspace_definition` * `id` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updatePipeline` | User updates a Unity Catalog pipeline. | * `id_arg` * `definition_json` * `id` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getPipeline` | User requests details about a Unity Catalog pipeline. | * `id` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deletePipeline` | User deletes a Unity Catalog pipeline. | * `id` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteResourceFailure` | Resource fails to delete | none |\n| `unityCatalog` | `createVolume` | User creates a Unity Catalog volume. | * `name` * `catalog_name` * `schema_name` * `volume_type` * `storage_location` * `owner` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getVolume` | User makes a call to get information on a Unity Catalog volume. | * `volume_full_name` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateVolume` | User updates a Unity Catalog volume\u2019s metadata with the `ALTER VOLUME` or `COMMENT ON` calls. | * `volume_full_name` * `name` * `owner` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteVolume` | User deletes a Unity Catalog volume. | * `volume_full_name` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listVolumes` | User makes a call to get a list of all Unity Catalog volumes in a schema. | * `catalog_name` * `schema_name` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `generateTemporaryVolumeCredential` | A temporary credential is generated when a user performs a read or write on a volume. You can use this event to determine who accessed a volume and when. | * `volume_id` * `volume_full_name` * `operation` * `volume_storage_location` * `credential_id` * `credential_type` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getTagSecurableAssignments` | Tag assignments for a securable are fetched | * `securable_type` * `securable_full_name` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getTagSubentityAssignments` | Tag assignments for a subentity are fetched | * `securable_type` * `securable_full_name` * `workspace_id` * `metastore_id` * `subentity_name` |\n| `unityCatalog` | `UpdateTagSecurableAssignments` | Tag assignments for a securable are updated | * `securable_type` * `securable_full_name` * `workspace_id` * `metastore_id` * `changes` |\n| `unityCatalog` | `UpdateTagSubentityAssignments` | Tag assignments for a subentity are updated | * `securable_type` * `securable_full_name` * `workspace_id` * `metastore_id` * `subentity_name` * `changes` |\n| `unityCatalog` | `createRegisteredModel` | User creates a Unity Catalog registered model. | * `name` * `catalog_name` * `schema_name` * `owner` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getRegisteredModel` | User makes a call to get information on a Unity Catalog registered model. | * `full_name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateRegisteredModel` | User updates a Unity Catalog registered model\u2019s metadata. | * `full_name_arg` * `name` * `owner` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteRegisteredModel` | User deletes a Unity Catalog registered model. | * `full_name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listRegisteredModels` | User makes a call to get a list of Unity Catalog registered models in a schema, or list models across catalogs and schemas. | * `catalog_name` * `schema_name` * `max_results` * `page_token` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `createModelVersion` | User creates a model version in Unity Catalog. | * `catalog_name` * `schema_name` * `model_name` * `source` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `finalizeModelVersion` | User makes a call to \u201cfinalize\u201d a Unity Catalog model version after uploading model version files to its storage location, making it read-only and usable in inference workflows. | * `full_name_arg` * `version_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getModelVersion` | User makes a call to get details on a model version. | * `full_name_arg` * `version_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getModelVersionByAlias` | User makes a call to get details on a model version using the alias. | * `full_name_arg` * `include_aliases` * `alias_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateModelVersion` | User updates a model version\u2019s metadata. | * `full_name_arg` * `version_arg` * `name` * `owner` * `comment` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteModelVersion` | User deletes a model version. | * `full_name_arg` * `version_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listModelVersions` | User makes a call to get a list of Unity Catalog model versions in a registered model. | * `catalog_name` * `schema_name` * `model_name` * `max_results` * `page_token` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `generateTemporaryModelVersionCredential` | A temporary credential is generated when a user performs a write (during initial model version creaiton) or read (after the model version has been finalized) on a model version. You can use this event to determine who accessed a model version and when. | * `full_name_arg` * `version_arg` * `operation` * `model_version_url` * `credential_id` * `credential_type` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `setRegisteredModelAlias` | User sets an alias on a Unity Catalog registered model. | * `full_name_arg` * `alias_arg` * `version` |\n| `unityCatalog` | `deleteRegisteredModelAlias` | User deletes an alias on a Unity Catalog registered model. | * `full_name_arg` * `alias_arg` |\n| `unityCatalog` | `getModelVersionByAlias` | User gets a Unity Catalog model version by alias. | * `full_name_arg` * `alias_arg` |\n| `unityCatalog` | `createConnection` | A new foreign connection is created. | * `name` * `connection_type` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteConnection` | A foreign connection is deleted. | * `name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getConnection` | A foreign connection is retrieved. | * `name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateConnection` | A foreign connection is updated. | * `name_arg` * `owner` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listConnections` | Foreign connections in a metastore are listed. | * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `createFunction` | User creates a new function. | * `function_info` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `updateFunction` | User updates a function. | * `full_name_arg` * `owner` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `listFunctions` | User requests a list of all functions within a specific parent catalog or schema. | * `catalog_name` * `schema_name` * `include_browse` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `getFunction` | User requests a function from a parent catalog or schema. | * `full_name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `deleteFunction` | User requests a function from a parent catalog or schema. | * `full_name_arg` * `workspace_id` * `metastore_id` |\n| `unityCatalog` | `createShareMarketplaceListingLink` | | * `links_infos` * `metastore_id` |\n| `unityCatalog` | `deleteShareMarketplaceListingLink` | | * `links_infos` * `metastore_id` |\n\n", "chunk_id": "8b7ba3d366db1246139c2a0763f67ec2", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Delta Sharing events\n\nDelta Sharing events are broken up into two sections: events recorded in the data provider\u2019s account and events recorded in the data recipient\u2019s account. \n### Delta Sharing provider events \nThe following audit log events are logged in the provider\u2019s account. Actions that are performed by recipients start with the `deltaSharing` prefix. Each of these logs also includes `request_params.metastore_id`, which is the metastore that manages the shared data, and `userIdentity.email`, which is the ID of the user who initiated the activity. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `unityCatalog` | `deltaSharingListShares` | A data recipient requests a list of shares. | * `options`: The pagination options provided with this request. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingGetShare` | A data recipient requests details about a shares. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingListSchemas` | A data recipient requests a list of shared schemas. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `options`: The pagination options provided with this request. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingListAllTables` | A data recipient requests a list of all shared tables. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingListTables` | A data recipient requests a list of shared tables. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `options`: The pagination options provided with this request. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingGetTableMetadata` | A data recipient requests a details about a table\u2019s metadata. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `schema`: The name of the schema. * `name`: The name of the table. * `predicateHints`: The predicates included in the query. * `limitHints`: The maximum number of rows to return. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingGetTableVersion` | A data recipient requests a details about a table version. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `schema`: The name of the schema. * `name`: The name of the table. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingQueryTable` | Logged when a data recipient queries a shared table. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `schema`: The name of the schema. * `name`: The name of the table. * `predicateHints`: The predicates included in the query. * `limitHints`: The maximum number of rows to return. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingQueryTableChanges` | Logged when a data recipient queries change data for a table. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `schema`: The name of the schema. * `name`: The name of the table. * `cdf_options`: Change data feed options. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingQueriedTable` | Logged after a data recipient gets a response to their query. The `response.result` field includes more information on the recipient\u2019s query (see [Audit and monitor data sharing](https://docs.databricks.com/data-sharing/audit-logs.html)) | * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingQueriedTableChanges` | Logged after a data recipient gets a response to their query. The `response.result` field includes more information on the recipient\u2019s query (see [Audit and monitor data sharing](https://docs.databricks.com/data-sharing/audit-logs.html)). | * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingListNotebookFiles` | A data recipient requests a list of shared notebook files. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingQueryNotebookFile` | A data recipient queries a shared notebook file. | * `file_name`: The name of the notebook file. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingListFunctions` | A data recipient requests a list of functions in a parent schema. | * `share`: The name of the share. * `schema`: The name of the parent schema of the function. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingListAllFunctions` | A data recipient requests a list of all shared functions. | * `share`: The name of the share. * `schema`: The name of the parent schema of the function. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingListFunctionVersions` | A data recipient requests a list of function versions. | * `share`: The name of the share. * `schema`: The name of the parent schema of the function. * `function`: The name of the function. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingListVolumes` | A data recipient requests a list of shared volumes in a schema. | * `share`: The name of the share. * `schema`: The parents schema of the volumes. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `deltaSharingListAllVolumes` | A data recipient requests all shared volumes. | * `share`: The name of the share. * `recipient_name`: Indicates the recipient executing the action. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, true if the request was denied and false if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `updateMetastore` | Provider updates their metastore. | * `delta_sharing_scope`: Values can be `INTERNAL` or `INTERNAL_AND_EXTERNAL`. * `delta_sharing_recipient_token_lifetime_in_seconds`: If present, indicates that the recipient token lifetime was updated. |\n| `unityCatalog` | `createRecipient` | Provider creates a data recipient. | * `name`: The name of the recipient. * `comment`: The comment for the recipient. * `ip_access_list.allowed_ip_addresses:` Recipient IP address allowlist. |\n| `unityCatalog` | `deleteRecipient` | Provider deletes a data recipient. | * `name`: The name of the recipient. |\n| `unityCatalog` | `getRecipient` | Provider requests details about a data recipient. | * `name`: The name of the recipient. |\n| `unityCatalog` | `listRecipients` | Provider requests a list of all their data recipients. | none |\n| `unityCatalog` | `rotateRecipientToken` | Provider rotates a recipient\u2019s token. | * `name`: The name of the recipient. * `comment`: The comment given in the rotation command. |\n| `unityCatalog` | `updateRecipient` | Provider updates a data recipient\u2019s attributes. | * `name`: The name of the recipient. * `updates`: A JSON representation of recipient attributes that were added or removed from the share. |\n| `unityCatalog` | `createShare` | Provider updates a data recipient\u2019s attributes. | * `name`: The name of the share. * `comment`: The comment for the share. |\n| `unityCatalog` | `deleteShare` | Provider updates a data recipient\u2019s attributes. | * `name`: The name of the share. |\n| `unityCatalog` | `getShare` | Provider requests details about a share. | * `name`: The name of the share. * `include_shared_objects`: Whether the share\u2019s table names were included in the request. |\n| `unityCatalog` | `updateShare` | Provider adds or removes data assets from a share. | * `name`: The name of the share. * `updates`: A JSON representation of data assets that were added or removed from the share. Each item includes `action` (add or remove), `name` (the actual name of the table), `shared_as` (the name the asset was shared as, if different from the actual name), and `partition_specification` (if a partition specification was provided). |\n| `unityCatalog` | `listShares` | Provider requests a list of their shares. | none |\n| `unityCatalog` | `getSharePermissions` | Provider requests details on a share\u2019s permissions. | * `name`: The name of the share. |\n| `unityCatalog` | `updateSharePermissions` | Provider updates a share\u2019s permissions. | * `name`: The name of the share. * `changes`: A JSON representation of the updated permissions. Each change includes `principal` (the user or group to whom permission is granted or revoked), `add` (the list of permissions that were granted), and `remove` (the list of permissions that were revoked). |\n| `unityCatalog` | `getRecipientSharePermissions` | Provider requests details about a recipient\u2019s share permissions. | * `name`: The name of the share. |\n| `unityCatalog` | `getActivationUrlInfo` | Provider requests details about activity on their activation link. | * `recipient_name`: The name of the recipient who opened the activation URL. * `is_ip_access_denied`: None if there is no IP access list configured. Otherwise, `true` if the request was denied and `false` if the request was not denied. `sourceIPaddress` is the recipient IP address. |\n| `unityCatalog` | `generateTemporaryVolumeCredential` | Temporary credential is generated for the recipient to access a shared volume. | * `share_name`: The name of the share through which the recipient requests. * `share_id`: The ID of the share. * `share_owner`: The owner of the share. * `recipient_name`: The name of the recipient who requests the credential. * `recipient_id`: The ID of the recipient. * `volume_full_name`: The full 3-level name of the volume. * `volume_id`: The ID of the volume. * `volume_storage_location`: The cloud path of the volume root. * `operation`: Either `READ_VOLUME` or `WRITE_VOLUME`. For volume sharing, only `READ_VOLUME` is supported. * `credential_id`: The ID of the credential. * `credential_type`: The type of the credential. Value is always `StorageCredential`. * `workspace_id`: Value is always `0` when the request is for shared volumes. |\n| `unityCatalog` | `generateTemporaryTableCredential` | Temporary credential is generated for the recipient to access a shared table. | * `share_name`: The name of the share through which the recipient requests. * `share_id`: The ID of the share. * `share_owner`: The owner of the share. * `recipient_name`: The name of the recipient who requests the credential. * `recipient_id`: The ID of the recipient. * `table_full_name`: The full 3-level name of the table. * `table_id`: The ID of the table. * `table_url`: The cloud path of the table root. * `operation`: Either `READ` or `READ_WRITE`. * `credential_id`: The ID of the credential. * `credential_type`: The type of the credential. Value is always `StorageCredential`. * `workspace_id`: Value is always `0` when the request is for shared tables. | \n### Delta Sharing recipient events \nThe following events are logged in the data recipient\u2019s account. These events record recipient access of shared data and AI assets, along with events associated with the management of providers. Each of these events also includes the following request parameters: \n* `recipient_name`: The name of the recipient in the data provider\u2019s system.\n* `metastore_id`: The name of the metastore in the data provider\u2019s system.\n* `sourceIPAddress`: The IP address where the request originated. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `unityCatalog` | `deltaSharingProxyGetTableVersion` | A data recipient requests a details on a shared table version. | * `share`: The name of the share. * `schema`: The name of the table\u2019s parent schema. * `name`: The name of the table. |\n| `unityCatalog` | `deltaSharingProxyGetTableMetadata` | A data recipient requests a details on a shared table\u2019s metadata. | * `share`: The name of the share. * `schema`: The name of the table\u2019s parent schema. * `name`: The name of the table. |\n| `unityCatalog` | `deltaSharingProxyQueryTable` | A data recipient queries a shared table. | * `share`: The name of the share. * `schema`: The name of the table\u2019s parent schema. * `name`: The name of the table. * `limitHints`: The maximum number of rows to return. * `predicateHints`: The predicates included in the query. * `version`: Table version, if change data feed is enabled. |\n| `unityCatalog` | `deltaSharingProxyQueryTableChanges` | A data recipient queries change data for a table. | * `share`: The name of the share. * `schema`: The name of the table\u2019s parent schema. * `name`: The name of the table. * `cdf_options`: Change data feed options. |\n| `unityCatalog` | `createProvider` | A data recipient creates a provider object. | * `name`: The name of the provider. * `comment`: The comment for the provider. |\n| `unityCatalog` | `updateProvider` | A data recipient updates a provider object. | * `name`: The name of the provider. * `updates`: A JSON representation of provider attributes that were added or removed from the share. Each item includes `action` (add or remove) and can include `name` (the new provider name), `owner` (new owner), and `comment`. |\n| `unityCatalog` | `deleteProvider` | A data recipient deletes a provider object. | * `name`: The name of the provider. |\n| `unityCatalog` | `getProvider` | A data recipient requests details about a provider object. | * `name`: The name of the provider. |\n| `unityCatalog` | `listProviders` | A data recipient requests a list of providers. | none |\n| `unityCatalog` | `activateProvider` | A data recipient activates a provider object. | * `name`: The name of the provider. |\n| `unityCatalog` | `listProviderShares` | A data recipient requests a list of a provider\u2019s shares. | * `name`: The name of the provider. |\n| `unityCatalog` | `generateTemporaryVolumeCredential` | Temporary credential is generated for the recipient to access a shared volume. | * `share_name`: The name of the share through which the recipient requests. * `volume_full_name`: The full 3-level name of the volume. * `volume_id`: The ID of the volume. * `operation`: Either `READ_VOLUME` or `WRITE_VOLUME`. For volume sharing, only `READ_VOLUME` is supported. * `workspace_id`: The ID of the workspace that receives the user request. |\n| `unityCatalog` | `generateTemporaryTableCredential` | Temporary credential is generated for the recipient to access a shared table. | * `share_name`: The name of the share through which the recipient requests. * `table_full_name`: The full 3-level name of the table. * `table_id`: The ID of the table. * `operation`: Either `READ` or `READ_WRITE`. * `workspace_id`: The ID of the workspace that receives the user request. |\n\n", "chunk_id": "8ada1dda770c25569043be839344cad7", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Additional security monitoring events\n\nFor Databricks compute resources in the [classic compute plane](https://docs.databricks.com/getting-started/overview.html), such as VMs for clusters and pro or classic SQL warehouses, the following features enable additional monitoring agents: \n* [Enhanced security monitoring](https://docs.databricks.com/security/privacy/enhanced-security-monitoring.html)\n* [Compliance security profile](https://docs.databricks.com/security/privacy/security-profile.html). The compliance security profile is required for the compliance controls for [PCI-DSS](https://docs.databricks.com/security/privacy/pci.html), [HIPAA](https://docs.databricks.com/security/privacy/hipaa.html), and [FedRAMP Moderate](https://docs.databricks.com/security/privacy/fedramp.html). \nFor serverless SQL warehouses, the monitoring agents run if the compliance security profile is enabled and [the region supports serverless SQL warehouses with the compliance security profile](https://docs.databricks.com/admin/sql/serverless.html#security-profile). \n### File integrity monitoring events \nThe following `capsule8-alerts-dataplane` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `capsule8-alerts-dataplane` | `Heartbeat` | A regular event to confirm the monitor is on. Currently runs every 10 minutes. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Memory Marked Executable` | Memory is often marked executable in order to allow malicious code to execute when an application is being exploited. Alerts when a program sets heap or stack memory permissions to executable. This can cause false positives for certain application servers. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `File Integrity Monitor` | Monitors the integrity of important system files. Alerts on any unauthorized changes to those files. Databricks defines specific sets of system paths on the image, and this set of paths might change over time. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Systemd Unit File Modified` | Changes to systemd units could result in security controls being relaxed or disabled, or the installation of a malicious service. Alerts whenever a `systemd` unit file is modified by a program other than `systemctl`. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Repeated Program Crashes` | Repeated program crashes could indicate that an attacker is attempting to exploit a memory corruption vulnerability, or that there is a stability issue in the affected application. Alerts when more than 5 instances of an individual program crash via segmentation fault. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Userfaultfd Usage` | As containers are typically static workloads, this alert could indicate that an attacker has compromised the container and is attempting to install and run a backdoor. Alerts when a file that has been created or modified within 30 minutes is then executed within a container. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `New File Executed in Container` | Memory is often marked executable in order to allow malicious code to execute when an application is being exploited. Alerts when a program sets heap or stack memory permissions to executable. This can cause false positives for certain application servers. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Suspicious Interactive Shell` | Interactive shells are rare occurrences on modern production infrastructure. Alerts when an interactive shell is started with arguments commonly used for reverse shells. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `User Command Logging Evasion` | Evading command logging is common practice for attackers, but might also indicate that a legitimate user is performing unauthorized actions or trying to evade policy. Alerts when a change to user command history logging is detected, indicating that a user is attempting to evade command logging. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `BPF Program Executed` | Detects some types of kernel backdoors. The loading of a new Berkeley Packet Filter (BPF) program could indicate that an attacker is loading a BPF-based rootkit to gain persistence and avoid detection. Alerts when a process loads a new privileged BPF program, if the process that is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Kernel Module Loaded` | Attackers commonly load malicious kernel modules (rootkits) to evade detection and maintain persistence on a compromised node. Alerts when a kernel module is loaded, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Suspicious Program Name Executed-Space After File` | Attackers might create or rename malicious binaries to include a space at the end of the name in an effort to impersonate a legitimate system program or service. Alerts when a program is executed with a space after the program name. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Illegal Elevation Of Privileges` | Kernel privilege escalation exploits commonly enable an unprivileged user to gain root privileges without passing standard gates for privilege changes. Alerts when a program attempts to elevate privileges through unusual means. This can issue false positive alerts on nodes with significant workloads. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Kernel Exploit` | Internal kernel functions are not accessible to regular programs, and if called, are a strong indicator that a kernel exploit has executed and that the attacker has full control of the node. Alerts when a kernel function unexpectedly returns to user space. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Processor-Level Protections Disabled` | SMEP and SMAP are processor-level protections that increase difficulty for kernel exploits to succeed, and disabling these restrictions is a common early step in kernel exploits. Alerts when a program tampers with the kernel SMEP/SMAP configuration. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Container Escape via Kernel Exploitation` | Alerts when a program uses kernel functions commonly used in container escape exploits, indicating that an attacker is escalating privileges from container-access to node-access. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Privileged Container Launched` | Privileged containers have direct access to host resources, leading to a greater impact when compromised. Alerts when a privileged container is launched, if the container isn\u2019t a known privileged image such as kube-proxy. This can issue unwanted alerts for legitimate privileged containers. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Userland Container Escape` | Many container escapes coerce the host to execute an in-container binary, resulting in the attacker gaining full control of the affected node. Alerts when a container-created file is executed from outside a container. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `AppArmor Disabled In Kernel` | Modification of certain AppArmor attributes can only occur in-kernel, indicating that AppArmor has been disabled by a kernel exploit or rootkit. Alerts when the AppArmor state is changed from the AppArmor configuration detected when the sensor starts. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `AppArmor Profile Modified` | Attackers might attempt to disable enforcement of AppArmor profiles as part of evading detection. Alerts when a command for modifying an AppArmor profile is executed, if it was not executed by a user in an SSH session. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Boot Files Modified` | If not performed by a trusted source (such as a package manager or configuration management tool), modification of boot files could indicate an attacker modifying the kernel or its options in order to gain persistent access to a host. Alerts when changes are made to files in `/boot`, indicating installation of a new kernel or boot configuration. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Log Files Deleted` | Log deletion not performed by a log management tool could indicate that an attacker is trying to remove indicators of compromise. Alerts on deletion of system log files. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `New File Executed` | Newly created files from sources other than system update programs might be backdoors, kernel exploits, or part of an exploitation chain. Alerts when a file that has been created or modified within 30 minutes is then executed, excluding files created by system update programs. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Root Certificate Store Modified` | Modification of the root certificate store could indicate the installation of a rogue certificate authority, enabling interception of network traffic or bypass of code signature verification. Alerts when a system CA certificate store is changed. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Setuid/Setgid Bit Set On File` | Setting `setuid/setgid` bits can be used to provide a persistent method for privilege escalation on a node. Alerts when the `setuid` or `setgid` bit is set on a file with the `chmod` family of system calls. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Hidden File Created` | Attackers often create hidden files as a means of obscuring tools and payloads on a compromised host. Alerts when a hidden file is created by a process associated with an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Modification Of Common System Utilities` | Attackers might modify system utilities in order to execute malicious payloads whenever these utilities are run. Alerts when a common system utility is modified by an unauthorized process. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Network Service Scanner Executed` | An attacker or rogue user might use or install these programs to survey connected networks for additional nodes to compromise. Alerts when common network scanning program tools are executed. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Network Service Created` | Attackers might start a new network service to provide easy access to a host after compromise. Alerts when a program starts a new network service, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Network Sniffing Program Executed` | An attacker or rogue user might execute network sniffing commands to capture credentials, personally-identifiable information (PII), or other sensitive information. Alerts when a program is executed that allows network capture. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Remote File Copy Detected` | Use of file transfer tools could indicate that an attacker is attempting to move toolsets to additional hosts or exfiltrate data to a remote system. Alerts when a program associated with remote file copying is executed, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Unusual Outbound Connection Detected` | Command and Control channels and cryptocoin miners often create new outbound network connections on unusual ports. Alerts when a program initiates a new connection on an uncommon port, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Data Archived Via Program` | After gaining access to a system, an attacker might create a compressed archive of files to reduce the size of data for exfiltration. Alerts when a data compression program is executed, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Process Injection` | Use of process injection techniques commonly indicates that a user is debugging a program, but might also indicate that an attacker is reading secrets from or injecting code into other processes. Alerts when a program uses `ptrace` (debugging) mechanisms to interact with another process. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Account Enumeration Via Program` | Attackers often use account enumeration programs to determine their level of access and to see if other users are currently logged in to the node. Alerts when a program associated with account enumeration is executed, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `File and Directory Discovery Via Program` | Exploring file systems is common post-exploitation behavior for an attacker looking for credentials and data of interest. Alerts when a program associated with file and directory enumeration is executed, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Network Configuration Enumeration Via Program` | Attackers can interrogate local network and route information to identify adjacent hosts and networks ahead of lateral movement. Alerts when a program associated with network configuration enumeration is executed, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Process Enumeration Via Program` | Attackers often list running programs in order to identify the purpose of a node and whether any security or monitoring tools are in place. Alerts when a program associated with process enumeration is executed, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `System Information Enumeration Via Program` | Attackers commonly execute system enumeration commands to determine Linux kernel and distribution versions and features, often to identify if the node is affected by specific vulnerabilities. Alerts when a program associated with system information enumeration is executed, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Scheduled Tasks Modified Via Program` | Modifying scheduled tasks is a common method for establishing persistence on a compromised node. Alerts when the `crontab`, `at`, or `batch` commands are used to modify scheduled task configurations. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Systemctl Usage Detected` | Changes to systemd units could result in security controls being relaxed or disabled, or the installation of a malicious service. Alerts when the `systemctl` command is used to modify systemd units. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `User Execution Of su Command` | Explicit escalation to the root user decreases the ability to correlate privileged activity to a specific user. Alerts when the `su` command is executed. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `User Execution Of sudo Command` | Alerts when the `sudo` command is executed. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `User Command History Cleared` | Deleting the history file is unusual, commonly performed by attackers hiding activity, or by legitimate users intending to evade audit controls. Alerts when command line history files are deleted. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `New System User Added` | An attacker might add a new user to a host to provide a reliable method of access. Alerts if a new user entity is added to the local account management file `/etc/passwd`, if the entity is not added by a system update program. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `Password Database Modification` | Attackers might directly modify identity-related files to add a new user to the system. Alerts when a file related to user passwords is modified by a program unrelated to updating existing user information. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `SSH Authorized Keys Modification` | Adding a new SSH public key is a common method for gaining persistent access to a compromised host. Alerts when an attempt to write to a user\u2019s SSH `authorized_keys` file is observed, if the program is already part of an ongoing incident. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `User Account Created Via CLI` | Adding a new user is a common step for attackers when establishing persistence on a compromised node. Alerts when an identity management program is executed by a program other than a package manager. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `User Configuration Changes` | Deleting the history file is unusual, commonly performed by attackers hiding activity, or by legitimate users intending to evade audit controls. Alerts when command line history files are deleted. | * `instanceId` |\n| `capsule8-alerts-dataplane` | `New System User Added` | User profile and configuration files are often modified as a method of persistence in order to execute a program whenever a user logs in. Alerts when `.bash_profile` and `bashrc` (as well as related files) are modified by a program other than a system update tool. | * `instanceId` | \n### Antivirus monitoring events \nNote \nThe `response` JSON object in these audit logs always has a `result` field that includes one line of the original scan result. Each scan result is represented typically by multiple audit log records, one for each line of the original scan output. For details of what could appear in this file, see the following [third-party documentation](https://docs.clamav.net/). \nThe following `clamAVScanService-dataplane` event is logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `clamAVScanService-dataplane` | `clamAVScanAction` | The antivirus monitoring performs a scan. A log will generate for each line of the original scan output. | * `instanceId` | \n### System log events \nNote \nThe `response` JSON object in the audit log has a `result` field that includes the original system log content. \nThe following `syslog` event is logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `syslog` | `processEvent` | The system log processes an event. | * `instanceId` * `processName` | \n### Process monitor log events \nThe following `monit` events are logged at the workspace level. \n| Service | Action | Description | Request parameters |\n| --- | --- | --- | --- |\n| `monit` | `processNotRunning` | The monitor is not running. | * `instanceId` * `processName` |\n| `monit` | `processRestarting` | The monitor is restarting. | * `instanceId` * `processName` |\n| `monit` | `processStarted` | The monitor started. | * `instanceId` * `processName` |\n| `monit` | `processRunning` | The monitor is running. | * `instanceId` * `processName` |\n\n", "chunk_id": "4d0e64754130b4b93d00440c9c25ef86", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Databricks administration introduction\n### Audit log reference\n#### Deprecated log events\n\nDatabricks has deprecated the following audit events: \n* `createAlertDestination` (now `createNotificationDestination`)\n* `deleteAlertDestination` (now `deleteNotificationDestination`)\n* `updateAlertDestination` (now `updateNotificationDestination`) \n### SQL endpoint logs \nIf you create SQL warehouses using the deprecated SQL endpoint API (the former name for SQL warehouses), the corresponding audit event name will include the word `Endpoint` instead of `Warehouse`. Besides the name, these events are identical to the SQL warehouse events. To view descriptions and request parameters of these events, see their corresponding warehouse events in [Databricks SQL events](https://docs.databricks.com/admin/account-settings/audit-logs.html#dbsql). \nThe SQL endpoint events are: \n* `changeEndpointAcls`\n* `createEndpoint`\n* `editEndpoint`\n* `startEndpoint`\n* `stopEndpoint`\n* `deleteEndpoint`\n* `setEndpointConfig`\n\n", "chunk_id": "b019b80761609a82190481cee800a0e2", "url": "https://docs.databricks.com/admin/account-settings/audit-logs.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `year` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns the year component of `expr`. This function is a synonym for `extract(YEAR FROM expr)`.\n\n####### `year` function\n######## Syntax\n\n```\nyear(expr)\n\n```\n\n####### `year` function\n######## Arguments\n\n* `expr`: A DATE or TIMESTAMP expression.\n\n####### `year` function\n######## Returns\n\nAn INTEGER.\n\n####### `year` function\n######## Examples\n\n```\n> SELECT year('2016-07-30');\n2016\n\n```\n\n####### `year` function\n######## Related functions\n\n* [dayofmonth function](https://docs.databricks.com/sql/language-manual/functions/dayofmonth.html)\n* [dayofweek function](https://docs.databricks.com/sql/language-manual/functions/dayofweek.html)\n* [day function](https://docs.databricks.com/sql/language-manual/functions/day.html)\n* [hour function](https://docs.databricks.com/sql/language-manual/functions/hour.html)\n* [minute function](https://docs.databricks.com/sql/language-manual/functions/minute.html)\n* [extract function](https://docs.databricks.com/sql/language-manual/functions/extract.html)\n\n", "chunk_id": "405aa2e8164dc923ea337d8a07e77e22", "url": "https://docs.databricks.com/sql/language-manual/functions/year.html"} +{"chunked_text": "# What is data warehousing on Databricks?\n### Access and manage saved queries\n\nThis article outlines how to use the Databricks UI to access and manage queries.\n\n### Access and manage saved queries\n#### View queries\n\nYou can view queries using the following methods: \n* Click ![Workspace Icon](https://docs.databricks.com/_images/workspace-icon.png) **Workspace** in the sidebar. Queries are viewable, by default, in the **Home** folder. Users can organize queries into folders in the workspace browser along with other Databricks objects.\n* Click ![Queries Icon](https://docs.databricks.com/_images/queries-icon.png) **Queries** in the sidebar. Objects in the **Queries** windows are sorted in reverse chronological order by default. You can reorder the list by clicking the **Created at** column heading. Type into the **Filter queries** text box to filter by Name, Tag, or Owner.\n\n### Access and manage saved queries\n#### Organize queries into folders in the workspace browser\n\nYou can organize queries into folders in the [workspace browser](https://docs.databricks.com/workspace/workspace-browser/index.html) and other Databricks objects. See [Workspace browser](https://docs.databricks.com/workspace/workspace-browser/index.html).\n\n", "chunk_id": "b5d689549ae8c60529d0627d40e89c9d", "url": "https://docs.databricks.com/sql/user/queries/index.html"} +{"chunked_text": "# What is data warehousing on Databricks?\n### Access and manage saved queries\n#### Transfer ownership of a query\n\nYou must be a workspace admin to transfer ownership of a query. Service principals and groups cannot be assigned ownership of a query. You can also transfer ownership using the [Permissions API](https://docs.databricks.com/api/workspace/permissions). \n1. As a workspace admin, log in to your Databricks workspace.\n2. In the sidebar, click **Queries**.\n3. Click a query.\n4. Click the **Share** button at the top right to open the **Sharing** dialog.\n5. Click on the gear icon at the top right and click **Assign new owner**. \n![Assign new owner](https://docs.databricks.com/_images/assign-new-owner.png)\n6. Select the user to assign ownership to.\n7. Click **Confirm**.\n\n", "chunk_id": "e8cf56e4fcd6407afceb5bec1dca7e32", "url": "https://docs.databricks.com/sql/user/queries/index.html"} +{"chunked_text": "# What is data warehousing on Databricks?\n### Access and manage saved queries\n#### Configure query permissions\n\nWorkspace admins and the query creator are automatically granted permissions to control which users can manage and run queries. You must have at least CAN MANAGE permission on a query to share queries. \nQueries support two types of sharing settings: \n* **Run as viewer**: The viewer\u2019s credential is used to run the query. The viewer must also have at least CAN USE permissions on the warehouse. \nUsers can only be granted the CAN EDIT permission when the sharing setting is set to Run as viewer.\n* **Run as owner**: The owner\u2019s credential is used to run the query. \nFor more information on query permission levels, see [Query ACLs](https://docs.databricks.com/security/auth-authz/access-control/index.html#query). \n1. In the sidebar, click **Queries**.\n2. Click a query.\n3. Click the ![Share Button](https://docs.databricks.com/_images/share-button.png) button at the top right to open the **Sharing** dialog. \n![Manage query permissions](https://docs.databricks.com/_images/manage-permissions.png)\n4. Follow the steps based on the permission type you want to grant:\n5. Search for and select the groups and users, and assign the permission level.\n6. Click **Add**.\n7. In the **Sharing settings > Credentials** field at the bottom, select either **Run as viewer** or **Run as owner**. \nYou can also copy the link to the query in the Sharing dialog.\n\n", "chunk_id": "4f0b9ec7e82add1e77903ac05f587df1", "url": "https://docs.databricks.com/sql/user/queries/index.html"} +{"chunked_text": "# What is data warehousing on Databricks?\n### Access and manage saved queries\n#### Admin access to all queries\n\nA Databricks workspace admin user has view access to all queries in the workspace. When the **All Queries** tab is selected, a workspace admin can view and delete any queries. However, a workspace admin can\u2019t edit a query when sharing setting credentials are set to **Run as owner**. \nTo view all queries: \n1. Click ![Queries Icon](https://docs.databricks.com/_images/queries-icon.png) **Queries** in the sidebar.\n2. Click the **All queries** tab near the top of the screen.\n\n### Access and manage saved queries\n#### Creating queries in other environments\n\nYou can create queries without using the Databricks UI using the Rest API, a JDBC/ODBC connector, or a partner tool. \nSee [Use a SQL database tool](https://docs.databricks.com/dev-tools/index-sql.html) to run SQL commands and browse database objects in Databricks. \nYou can also create a query with the [Databricks Terraform provider](https://docs.databricks.com/dev-tools/terraform/index.html) and [databricks\\_sql\\_query](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/sql_query). \nSee [Technology partners](https://docs.databricks.com/integrations/index.html) to learn about partner tools you can use through Partner Connect.\n\n", "chunk_id": "a1f26429cfc7e0841e2650d1979ff219", "url": "https://docs.databricks.com/sql/user/queries/index.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Provision infrastructure\n##### Create clusters, notebooks, and jobs with Terraform\n\nThis article shows how to use the [Databricks Terraform provider](https://docs.databricks.com/dev-tools/terraform/index.html) to create a [cluster](https://docs.databricks.com/compute/index.html), a [notebook](https://docs.databricks.com/notebooks/index.html), and a [job](https://docs.databricks.com/workflows/jobs/create-run-jobs.html) in an existing Databricks [workspace](https://docs.databricks.com/workspace/index.html). \nThis article is a companion to the following Databricks getting started articles: \n* [Run your first ETL workload on Databricks](https://docs.databricks.com/getting-started/etl-quick-start.html), which uses a general-purpose cluster, a Python notebook, and a job to run the notebook.\n* [Get started: Query and visualize data from a notebook](https://docs.databricks.com/getting-started/quick-start.html), which uses a general-purpose cluster and a SQL notebook. \n* [Tutorial: Run an end-to-end lakehouse analytics pipeline](https://docs.databricks.com/getting-started/lakehouse-e2e.html), which uses a cluster that works with Unity Catalog, a Python notebook, and a job to run the notebook. \nYou can also adapt the Terraform configurations in this article to create custom clusters, notebooks, and jobs in your workspaces.\n\n", "chunk_id": "2e74555fddcc19a93adef7ca1033afa8", "url": "https://docs.databricks.com/dev-tools/terraform/cluster-notebook-job.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Provision infrastructure\n##### Create clusters, notebooks, and jobs with Terraform\n###### Step 1: Create and configure the Terraform project\n\n1. Create a Terraform project by following the instructions in the [Requirements](https://docs.databricks.com/dev-tools/terraform/index.html#requirements) section of the Databricks Terraform provider overview article.\n2. To create a cluster, create a file named `cluster.tf`, and add the following content to the file. This content creates a cluster with the smallest amount of resources allowed. This cluster uses the lastest Databricks Runtime Long Term Support (LTS) version. \nFor a cluster that works with Unity Catalog: \n```\nvariable \"cluster_name\" {}\nvariable \"cluster_autotermination_minutes\" {}\nvariable \"cluster_num_workers\" {}\nvariable \"cluster_data_security_mode\" {}\n\n# Create the cluster with the \"smallest\" amount\n# of resources allowed.\ndata \"databricks_node_type\" \"smallest\" {\nlocal_disk = true\n}\n\n# Use the latest Databricks Runtime\n# Long Term Support (LTS) version.\ndata \"databricks_spark_version\" \"latest_lts\" {\nlong_term_support = true\n}\n\nresource \"databricks_cluster\" \"this\" {\ncluster_name = var.cluster_name\nnode_type_id = data.databricks_node_type.smallest.id\nspark_version = data.databricks_spark_version.latest_lts.id\nautotermination_minutes = var.cluster_autotermination_minutes\nnum_workers = var.cluster_num_workers\ndata_security_mode = var.cluster_data_security_mode\n}\n\noutput \"cluster_url\" {\nvalue = databricks_cluster.this.url\n}\n\n``` \nFor an all-purpose cluster: \n```\nvariable \"cluster_name\" {\ndescription = \"A name for the cluster.\"\ntype = string\ndefault = \"My Cluster\"\n}\n\nvariable \"cluster_autotermination_minutes\" {\ndescription = \"How many minutes before automatically terminating due to inactivity.\"\ntype = number\ndefault = 60\n}\n\nvariable \"cluster_num_workers\" {\ndescription = \"The number of workers.\"\ntype = number\ndefault = 1\n}\n\n# Create the cluster with the \"smallest\" amount\n# of resources allowed.\ndata \"databricks_node_type\" \"smallest\" {\nlocal_disk = true\n}\n\n# Use the latest Databricks Runtime\n# Long Term Support (LTS) version.\ndata \"databricks_spark_version\" \"latest_lts\" {\nlong_term_support = true\n}\n\nresource \"databricks_cluster\" \"this\" {\ncluster_name = var.cluster_name\nnode_type_id = data.databricks_node_type.smallest.id\nspark_version = data.databricks_spark_version.latest_lts.id\nautotermination_minutes = var.cluster_autotermination_minutes\nnum_workers = var.cluster_num_workers\n}\n\noutput \"cluster_url\" {\nvalue = databricks_cluster.this.url\n}\n\n```\n3. To create a cluster, create another file named `cluster.auto.tfvars`, and add the following content to the file. This file contains variable values for customizing the cluster. Replace the placeholder values with your own values. \nFor a cluster that works with Unity Catalog: \n```\ncluster_name = \"My Cluster\"\ncluster_autotermination_minutes = 60\ncluster_num_workers = 1\ncluster_data_security_mode = \"SINGLE_USER\"\n\n``` \nFor an all-purpose cluster: \n```\ncluster_name = \"My Cluster\"\ncluster_autotermination_minutes = 60\ncluster_num_workers = 1\n\n```\n4. To create a notebook, create another file named `notebook.tf`, and add the following content to the file: \n```\nvariable \"notebook_subdirectory\" {\ndescription = \"A name for the subdirectory to store the notebook.\"\ntype = string\ndefault = \"Terraform\"\n}\n\nvariable \"notebook_filename\" {\ndescription = \"The notebook's filename.\"\ntype = string\n}\n\nvariable \"notebook_language\" {\ndescription = \"The language of the notebook.\"\ntype = string\n}\n\nresource \"databricks_notebook\" \"this\" {\npath = \"${data.databricks_current_user.me.home}/${var.notebook_subdirectory}/${var.notebook_filename}\"\nlanguage = var.notebook_language\nsource = \"./${var.notebook_filename}\"\n}\n\noutput \"notebook_url\" {\nvalue = databricks_notebook.this.url\n}\n\n```\n5. If you are creating a cluster, save the following notebook code to a file in the same directory as the `notebook.tf` file: \nFor the Python notebook for [Run your first ETL workload on Databricks](https://docs.databricks.com/getting-started/etl-quick-start.html), a file named `notebook-getting-started-etl-quick-start.py` with the following contents: \n```\n# Databricks notebook source\n# Import functions\nfrom pyspark.sql.functions import col, current_timestamp\n\n# Define variables used in code below\nfile_path = \"/databricks-datasets/structured-streaming/events\"\nusername = spark.sql(\"SELECT regexp_replace(current_user(), '[^a-zA-Z0-9]', '_')\").first()[0]\ntable_name = f\"{username}_etl_quickstart\"\ncheckpoint_path = f\"/tmp/{username}/_checkpoint/etl_quickstart\"\n\n# Clear out data from previous demo execution\nspark.sql(f\"DROP TABLE IF EXISTS {table_name}\")\ndbutils.fs.rm(checkpoint_path, True)\n\n# Configure Auto Loader to ingest JSON data to a Delta table\n(spark.readStream\n.format(\"cloudFiles\")\n.option(\"cloudFiles.format\", \"json\")\n.option(\"cloudFiles.schemaLocation\", checkpoint_path)\n.load(file_path)\n.select(\"*\", col(\"_metadata.file_path\").alias(\"source_file\"), current_timestamp().alias(\"processing_time\"))\n.writeStream\n.option(\"checkpointLocation\", checkpoint_path)\n.trigger(availableNow=True)\n.toTable(table_name))\n\n# COMMAND ----------\n\ndf = spark.read.table(table_name)\n\n# COMMAND ----------\n\ndisplay(df)\n\n``` \nFor the SQL notebook for [Get started: Query and visualize data from a notebook](https://docs.databricks.com/getting-started/quick-start.html), a file named `notebook-getting-started-quick-start.sql` with the following contents: \n```\n-- Databricks notebook source\n-- MAGIC %python\n-- MAGIC diamonds = (spark.read\n-- MAGIC .format(\"csv\")\n-- MAGIC .option(\"header\", \"true\")\n-- MAGIC .option(\"inferSchema\", \"true\")\n-- MAGIC .load(\"/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv\")\n-- MAGIC )\n-- MAGIC\n-- MAGIC diamonds.write.format(\"delta\").save(\"/mnt/delta/diamonds\")\n\n-- COMMAND ----------\n\nDROP TABLE IF EXISTS diamonds;\n\nCREATE TABLE diamonds USING DELTA LOCATION '/mnt/delta/diamonds/'\n\n-- COMMAND ----------\n\nSELECT color, avg(price) AS price FROM diamonds GROUP BY color ORDER BY COLOR\n\n``` \nFor the Python notebook for [Tutorial: Run an end-to-end lakehouse analytics pipeline](https://docs.databricks.com/getting-started/lakehouse-e2e.html), a file named `notebook-getting-started-lakehouse-e2e.py` with the following contents: \n```\n# Databricks notebook source\nexternal_location = \"\"\ncatalog = \"\"\n\ndbutils.fs.put(f\"{external_location}/foobar.txt\", \"Hello world!\", True)\ndisplay(dbutils.fs.head(f\"{external_location}/foobar.txt\"))\ndbutils.fs.rm(f\"{external_location}/foobar.txt\")\n\ndisplay(spark.sql(f\"SHOW SCHEMAS IN {catalog}\"))\n\n# COMMAND ----------\n\nfrom pyspark.sql.functions import col\n\n# Set parameters for isolation in workspace and reset demo\nusername = spark.sql(\"SELECT regexp_replace(current_user(), '[^a-zA-Z0-9]', '_')\").first()[0]\ndatabase = f\"{catalog}.e2e_lakehouse_{username}_db\"\nsource = f\"{external_location}/e2e-lakehouse-source\"\ntable = f\"{database}.target_table\"\ncheckpoint_path = f\"{external_location}/_checkpoint/e2e-lakehouse-demo\"\n\nspark.sql(f\"SET c.username='{username}'\")\nspark.sql(f\"SET c.database={database}\")\nspark.sql(f\"SET c.source='{source}'\")\n\nspark.sql(\"DROP DATABASE IF EXISTS ${c.database} CASCADE\")\nspark.sql(\"CREATE DATABASE ${c.database}\")\nspark.sql(\"USE ${c.database}\")\n\n# Clear out data from previous demo execution\ndbutils.fs.rm(source, True)\ndbutils.fs.rm(checkpoint_path, True)\n\n# Define a class to load batches of data to source\nclass LoadData:\n\ndef __init__(self, source):\nself.source = source\n\ndef get_date(self):\ntry:\ndf = spark.read.format(\"json\").load(source)\nexcept:\nreturn \"2016-01-01\"\nbatch_date = df.selectExpr(\"max(distinct(date(tpep_pickup_datetime))) + 1 day\").first()[0]\nif batch_date.month == 3:\nraise Exception(\"Source data exhausted\")\nreturn batch_date\n\ndef get_batch(self, batch_date):\nreturn (\nspark.table(\"samples.nyctaxi.trips\")\n.filter(col(\"tpep_pickup_datetime\").cast(\"date\") == batch_date)\n)\n\ndef write_batch(self, batch):\nbatch.write.format(\"json\").mode(\"append\").save(self.source)\n\ndef land_batch(self):\nbatch_date = self.get_date()\nbatch = self.get_batch(batch_date)\nself.write_batch(batch)\n\nRawData = LoadData(source)\n\n# COMMAND ----------\n\nRawData.land_batch()\n\n# COMMAND ----------\n\n# Import functions\nfrom pyspark.sql.functions import col, current_timestamp\n\n# Configure Auto Loader to ingest JSON data to a Delta table\n(spark.readStream\n.format(\"cloudFiles\")\n.option(\"cloudFiles.format\", \"json\")\n.option(\"cloudFiles.schemaLocation\", checkpoint_path)\n.load(file_path)\n.select(\"*\", col(\"_metadata.file_path\").alias(\"source_file\"), current_timestamp().alias(\"processing_time\"))\n.writeStream\n.option(\"checkpointLocation\", checkpoint_path)\n.trigger(availableNow=True)\n.option(\"mergeSchema\", \"true\")\n.toTable(table))\n\n# COMMAND ----------\n\ndf = spark.read.table(table_name)\n\n# COMMAND ----------\n\ndisplay(df)\n\n```\n6. If you are creating a notebook, create another file named `notebook.auto.tfvars`, and add the following content to the file. This file contains variable values for customizing the notebook configuration. \nFor the Python notebook for [Run your first ETL workload on Databricks](https://docs.databricks.com/getting-started/etl-quick-start.html): \n```\nnotebook_subdirectory = \"Terraform\"\nnotebook_filename = \"notebook-getting-started-etl-quick-start.py\"\nnotebook_language = \"PYTHON\"\n\n``` \nFor the SQL notebook for [Get started: Query and visualize data from a notebook](https://docs.databricks.com/getting-started/quick-start.html): \n```\nnotebook_subdirectory = \"Terraform\"\nnotebook_filename = \"notebook-getting-started-quickstart.sql\"\nnotebook_language = \"SQL\"\n\n``` \nFor the Python notebook for [Tutorial: Run an end-to-end lakehouse analytics pipeline](https://docs.databricks.com/getting-started/lakehouse-e2e.html): \n```\nnotebook_subdirectory = \"Terraform\"\nnotebook_filename = \"notebook-getting-started-lakehouse-e2e.py\"\nnotebook_language = \"PYTHON\"\n\n```\n7. If you are creating a notebook, in your Databricks workspace, be sure to set up any requirements for the notebook to run successfully, by referring to the following instructions for: \n* The Python notebook for [Run your first ETL workload on Databricks](https://docs.databricks.com/getting-started/etl-quick-start.html)\n* The SQL notebook for [Get started: Query and visualize data from a notebook](https://docs.databricks.com/getting-started/quick-start.html)\n* The Python notebook for [Tutorial: Run an end-to-end lakehouse analytics pipeline](https://docs.databricks.com/getting-started/lakehouse-e2e.html)\n8. To create the job, create another file named `job.tf`, and add the following content to the file. This content creates a job to run the notebook. \n```\nvariable \"job_name\" {\ndescription = \"A name for the job.\"\ntype = string\ndefault = \"My Job\"\n}\n\nvariable \"task_key\" {\ndescription = \"A name for the task.\"\ntype = string\ndefault = \"my_task\"\n}\n\nresource \"databricks_job\" \"this\" {\nname = var.job_name\ntask {\ntask_key = var.task_key\nexisting_cluster_id = databricks_cluster.this.cluster_id\nnotebook_task {\nnotebook_path = databricks_notebook.this.path\n}\n}\nemail_notifications {\non_success = [ data.databricks_current_user.me.user_name ]\non_failure = [ data.databricks_current_user.me.user_name ]\n}\n}\n\noutput \"job_url\" {\nvalue = databricks_job.this.url\n}\n\n```\n9. If you are creating a job, create another file named `job.auto.tfvars`, and add the following content to the file. This file contains a variable value for customizing the job configuration. \n```\njob_name = \"My Job\"\ntask_key = \"my_task\"\n\n```\n\n", "chunk_id": "b7ac1f951cf2202fc3f038655ec75dc7", "url": "https://docs.databricks.com/dev-tools/terraform/cluster-notebook-job.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Provision infrastructure\n##### Create clusters, notebooks, and jobs with Terraform\n###### Step 2: Run the configurations\n\nIn this step, you run the Terraform configurations to deploy the cluster, the notebook, and the job into your Databricks workspace. \n1. Check to see whether your Terraform configurations are valid by running the `terraform validate` command. If any errors are reported, fix them, and run the command again. \n```\nterraform validate\n\n```\n2. Check to see what Terraform will do in your workspace, before Terraform actually does it, by running the `terraform plan` command. \n```\nterraform plan\n\n```\n3. Deploy the cluster, the notebook, and the job into your workspace by running the `terraform apply` command. When prompted to deploy, type `yes` and press **Enter**. \n```\nterraform apply\n\n``` \nTerraform deploys the resources that are specified in your project. Deploying these resources (especially a cluster) can take several minutes.\n\n", "chunk_id": "69fed615d1dfbf0da3d816e9ce84afac", "url": "https://docs.databricks.com/dev-tools/terraform/cluster-notebook-job.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Provision infrastructure\n##### Create clusters, notebooks, and jobs with Terraform\n###### Step 3: Explore the results\n\n1. If you created a cluster, in the output of the `terraform apply` command, copy the link next to `cluster_url`, and paste it into your web browser\u2019s address bar.\n2. If you created a notebook, in the output of the `terraform apply` command, copy the link next to `notebook_url`, and paste it into your web browser\u2019s address bar. \nNote \nBefore you use the notebook, you might need to customize its contents. See the related documentation about how to customize the notebook.\n3. If you created a job, in the output of the `terraform apply` command, copy the link next to `job_url`, and paste it into your web browser\u2019s address bar. \nNote \nBefore you run the notebook, you might need to customize its contents. See the links at the beginning of this article for related documentation about how to customize the notebook.\n4. If you created a job, run the job as follows: \n1. Click **Run now** on the job page.\n2. After the job finishes running, to view the job run\u2019s results, in the **Completed runs (past 60 days)** list on the job page, click the most recent time entry in the **Start time** column. The **Output** pane shows the result of running the notebook\u2019s code.\n\n##### Create clusters, notebooks, and jobs with Terraform\n###### Step 4: Clean up\n\nIn this step, you delete the preceding resources from your workspace. \n1. Check to see what Terraform will do in your workspace, before Terraform actually does it, by running the `terraform plan` command. \n```\nterraform plan\n\n```\n2. Delete the cluster, the notebook, and the job from your workspace by running the `terraform destroy` command. When prompted to delete, type `yes` and press **Enter**. \n```\nterraform destroy\n\n``` \nTerraform deletes the resources that are specified in your project.\n\n", "chunk_id": "65fa7f8bcf07723b29347da5a0b728d7", "url": "https://docs.databricks.com/dev-tools/terraform/cluster-notebook-job.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks JDBC Driver\n", "chunk_id": "9b1a62897773c0e1069af050178feaff", "url": "https://docs.databricks.com/integrations/jdbc/authentication.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks JDBC Driver\n######### Authentication settings for the Databricks JDBC Driver\n\nThis article describes how to configure Databricks authentication settings for the [Databricks JDBC Driver](https://docs.databricks.com/integrations/jdbc/index.html). \nTo configure a Databricks connection for the Databricks JDBC Driver, you must combine your compute resource settings, any driver capability settings, and the following authentication settings, into a JDBC connection URL or programmatic collection of JDBC connection properties. \nJDBC connection URLs use the following format: \n```\njdbc:databricks://:443;httpPath=[;=;=;=]\n\n``` \n* To get the values for `` and ``, see [Compute settings for the Databricks JDBC Driver](https://docs.databricks.com/integrations/jdbc/compute.html).\n* Replace `=` as needed for each of the connection properties as listed in the following sections.\n* You can also add special or advanced [driver capability settings](https://docs.databricks.com/integrations/jdbc/capability.html). \nProgrammatic collections of JDBC connection properties can be used in Java code such as the following example: \n```\npackage org.example;\n\nimport java.sql.Connection;\nimport java.sql.DriverManager;\nimport java.sql.ResultSet;\nimport java.sql.ResultSetMetaData;\nimport java.sql.Statement;\nimport java.util.Properties;\n\npublic class Main {\npublic static void main(String[] args) throws Exception {\nClass.forName(\"com.databricks.client.jdbc.Driver\");\nString url = \"jdbc:databricks://\" + System.getenv(\"DATABRICKS_SERVER_HOSTNAME\") + \":443\";\nProperties p = new java.util.Properties();\np.put(\"httpPath\", System.getenv(\"DATABRICKS_HTTP_PATH\"));\np.put(\"\", \"\", \"\", \"\")) {\nResultSetMetaData md = rs.getMetaData();\nString[] columns = new String[md.getColumnCount()];\nfor (int i = 0; i < columns.length; i++) {\ncolumns[i] = md.getColumnName(i + 1);\n}\nwhile (rs.next()) {\nSystem.out.print(\"Row \" + rs.getRow() + \"=[\");\nfor (int i = 0; i < columns.length; i++) {\nif (i != 0) {\nSystem.out.print(\", \");\n}\nSystem.out.print(columns[i] + \"='\" + rs.getObject(i + 1) + \"'\");\n}\nSystem.out.println(\")]\");\n}\n}\n}\nSystem.exit(0);\n}\n}\n\n``` \n* Set the `DATABRICKS_SERVER_HOSTNAME` and `DATABRICKS_HTTP_PATH` environment values to the target Databricks compute resource\u2019s **Server Hostname** and **HTTP Path** values, respectively. To get these values, see [Compute settings for the Databricks JDBC Driver](https://docs.databricks.com/integrations/jdbc/compute.html). To set environment variables, see your operating system\u2019s documentation.\n* Replace `` and `` as needed for each of the connection properties as listed in the following sections.\n* You can also add special or advanced [driver capability settings](https://docs.databricks.com/integrations/jdbc/capability.html), typically as additional `` and `` pairs.\n* For this example, replace `` with a SQL `SELECT` query string. \nWhether you use a connection URL or a collection of connection properties will depend on the requirements of your target app, tool, client, SDK, or API. Examples of JDBC connection URLs and programmatic collections of JDBC connection properties are provided in this article for each supported Databricks authentication type. \nThe Databricks JDBC Driver supports the following Databricks authentication types: \n* [Databricks personal access token](https://docs.databricks.com/integrations/jdbc/authentication.html#authentication-pat)\n* [Databricks username and password](https://docs.databricks.com/integrations/jdbc/authentication.html#authentication-username-password)\n* [OAuth 2.0 tokens](https://docs.databricks.com/integrations/jdbc/authentication.html#authentication-pass-through)\n* [OAuth user-to-machine (U2M) authentication](https://docs.databricks.com/integrations/jdbc/authentication.html#authentication-u2m)\n* [OAuth machine-to-machine (M2M) authentication](https://docs.databricks.com/integrations/jdbc/authentication.html#authentication-m2m)\n\n", "chunk_id": "977c49dffb5eb4b9a512aca3f64925be", "url": "https://docs.databricks.com/integrations/jdbc/authentication.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks JDBC Driver\n######### Authentication settings for the Databricks JDBC Driver\n########## Databricks personal access token\n\nTo create a Databricks personal access token, do the following: \n1. In your Databricks workspace, click your Databricks username in the top bar, and then select **Settings** from the drop down.\n2. Click **Developer**.\n3. Next to **Access tokens**, click **Manage**.\n4. Click **Generate new token**.\n5. (Optional) Enter a comment that helps you to identify this token in the future, and change the token\u2019s default lifetime of 90 days. To create a token with no lifetime (not recommended), leave the **Lifetime (days)** box empty (blank).\n6. Click **Generate**.\n7. Copy the displayed token to a secure location, and then click **Done**. \nNote \nBe sure to save the copied token in a secure location. Do not share your copied token with others. If you lose the copied token, you cannot regenerate that exact same token. Instead, you must repeat this procedure to create a new token. If you lose the copied token, or you believe that the token has been compromised, Databricks strongly recommends that you immediately delete that token from your workspace by clicking the trash can (**Revoke**) icon next to the token on the **Access tokens** page. \nIf you are not able to create or use tokens in your workspace, this might be because your workspace administrator has disabled tokens or has not given you permission to create or use tokens. See your workspace administrator or the following: \n* [Enable or disable personal access token authentication for the workspace](https://docs.databricks.com/admin/access-control/tokens.html#enable-tokens)\n* [Personal access token permissions](https://docs.databricks.com/security/auth-authz/api-access-permissions.html#pat) \nTo authenticate using a Databricks personal access token, set the following configuration. \nFor a JDBC connection URL with embedded general configuration properties and sensitive credential properties: \n```\njdbc:databricks://:443;httpPath=;AuthMech=3;UID=token;PWD=\n\n``` \nFor Java code with general configuration properties and sensitive credential properties set outside of the JDBC connection URL: \n```\n// ...\nString url = \"jdbc:databricks://:443\";\nProperties p = new java.util.Properties();\np.put(\"httpPath\", \"\");\np.put(\"AuthMech\", \"3\");\np.put(\"UID\", \"token\");\np.put(\"PWD\", \"\");\n// ...\nConnection conn = DriverManager.getConnection(url, p);\n// ...\n\n``` \n* For a complete Java code example that you can adapt the preceding code snippet to you own needs, see the code example at the beginning of this article.\n* In the preceding URL or Java code, replace `` with the Databricks personal access token for your workspace user.\n* To get the values for `` and ``, see [Compute settings for the Databricks JDBC Driver](https://docs.databricks.com/integrations/jdbc/compute.html).\n\n", "chunk_id": "7357ece04971c67ea14881edad255cd9", "url": "https://docs.databricks.com/integrations/jdbc/authentication.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks JDBC Driver\n######### Authentication settings for the Databricks JDBC Driver\n########## Databricks username and password\n\nDatabricks username and password authentication is also known as Databricks *basic* authentication. \nUsername and password authentication is possible only if [single sign-on](https://docs.databricks.com/admin/users-groups/single-sign-on/index.html) is disabled. \nTo authenticate using a Databricks username and password, set the following configuration. \nFor a JDBC connection URL with embedded general configuration properties and sensitive credential properties: \n```\njdbc:databricks://:443;httpPath=;AuthMech=3;UID=;PWD=\n\n``` \nFor Java code with general configuration properties and sensitive credential properties set outside of the JDBC connection URL: \n```\n// ...\nString url = \"jdbc:databricks://:443\";\nProperties p = new java.util.Properties();\np.put(\"httpPath\", \"\");\np.put(\"AuthMech\", \"3\");\np.put(\"UID\", \"\");\np.put(\"PWD\", \"\");\n// ...\nConnection conn = DriverManager.getConnection(url, p);\n// ...\n\n``` \n* For a complete Java code example that you can adapt the preceding code snippet to you own needs, see the code example at the beginning of this article.\n* In the preceding URL or Java code, replace `` and `` with the username and password.\n* To get the values for `` and ``, see [Compute settings for the Databricks JDBC Driver](https://docs.databricks.com/integrations/jdbc/compute.html). \nFor more information, see the `Using User Name and Password` section in the [Databricks JDBC Driver Guide](https://docs.databricks.com/_extras/documents/Databricks-JDBC-Driver-Install-and-Configuration-Guide.pdf).\n\n", "chunk_id": "8ad9ffb41845b92226918b4bb46d4a9a", "url": "https://docs.databricks.com/integrations/jdbc/authentication.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks JDBC Driver\n######### Authentication settings for the Databricks JDBC Driver\n########## OAuth 2.0 tokens\n\nJDBC driver 2.6.36 and above supports an OAuth 2.0 token for a Databricks user or service principal. This is also known as OAuth 2.0 *token pass-through* authentication. \nTo create an OAuth 2.0 token for token pass-through authentication, do the following: \n* For a user, you can use the [Databricks CLI](https://docs.databricks.com/dev-tools/cli/install.html) to generate the OAuth 2.0 token by initiating the OAuth U2M process, and then get the generated OAuth 2.0 token by running the `databricks auth token` command. See [OAuth user-to-machine (U2M) authentication](https://docs.databricks.com/dev-tools/cli/authentication.html#u2m-auth). OAuth 2.0 tokens have a default lifetime of 1 hour. To generate a new OAuth 2.0 token, repeat this process.\n* For a service principal, see [Manually generate and use access tokens for OAuth machine-to-machine (M2M) authentication](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html#oauth-m2m-manual). Make a note of the service principal\u2019s OAuth `access_token` value. OAuth 2.0 tokens have a default lifetime of 1 hour. To generate a new OAuth 2.0 token, repeat this process. \nTo authenticate using OAuth 2.0 token pass-through authentication, set the following configuration. \nFor a JDBC connection URL with embedded general configuration properties and sensitive credential properties: \n```\njdbc:databricks://:443;httpPath=;AuthMech=11;Auth_Flow=0;Auth_AccessToken=\n\n``` \nFor Java code with general configuration properties and sensitive credential properties set outside of the JDBC connection URL: \n```\n// ...\nString url = \"jdbc:databricks://:443\";\nProperties p = new java.util.Properties();\np.put(\"httpPath\", \"\");\np.put(\"AuthMech\", \"11\");\np.put(\"Auth_Flow\", \"0\");\np.put(\"Auth_AccessToken\", \"\");\n// ...\nConnection conn = DriverManager.getConnection(url, p);\n// ...\n\n``` \n* For a complete Java code example that you can adapt the preceding code snippet to you own needs, see the code example at the beginning of this article.\n* In the preceding URL or Java code, replace `` with the OAuth 2.0 token.\n* To get the values for `` and ``, see [Compute settings for the Databricks JDBC Driver](https://docs.databricks.com/integrations/jdbc/compute.html). \nFor more information, see the `Token Pass-through` section in the [Databricks JDBC Driver Guide](https://docs.databricks.com/_extras/documents/Databricks-JDBC-Driver-Install-and-Configuration-Guide.pdf).\n\n", "chunk_id": "d2c248cf24a0dcf3dfa54f07c8e16880", "url": "https://docs.databricks.com/integrations/jdbc/authentication.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks JDBC Driver\n######### Authentication settings for the Databricks JDBC Driver\n########## OAuth user-to-machine (U2M) authentication\n\nJDBC driver 2.6.36 and above supports OAuth user-to-machine (U2M) authentication for a Databricks user. This is also known as OAuth 2.0 *browser-based* authentication. \nOAuth U2M or OAuth 2.0 browser-based authentication has no prerequisites. OAuth 2.0 tokens have a default lifetime of 1 hour. OAuth U2M or OAuth 2.0 browser-based authentication should refresh expired OAuth 2.0 tokens for you automatically. \nNote \nOAuth U2M or OAuth 2.0 browser-based authentication works only with applications that run locally. It does not work with server-based or cloud-based applications. \nTo authenticate using OAuth user-to-machine (U2M) or OAuth 2.0 browser-based authentication, set the following configuration. \nFor a JDBC connection URL with embedded general configuration properties and sensitive credential properties: \n```\njdbc:databricks://:443;httpPath=;AuthMech=11;Auth_Flow=2;TokenCachePassPhrase=;EnableTokenCache=0\n\n``` \nFor Java code with general configuration properties and sensitive credential properties set outside of the JDBC connection URL: \n```\n// ...\nString url = \"jdbc:databricks://:443\";\nProperties p = new java.util.Properties();\np.put(\"httpPath\", \"\");\np.put(\"AuthMech\", \"11\");\np.put(\"Auth_Flow\", \"2\");\np.put(\"TokenCachePassPhrase\", \"\");\np.put(\"EnableTokenCache\", \"0\");\n// ...\nConnection conn = DriverManager.getConnection(url, p);\n// ...\n\n``` \n* For a complete Java code example that you can adapt the preceding code snippet to you own needs, see the code example at the beginning of this article.\n* In the preceding URL or Java code, replace `` with a passphrase of your choice. The driver uses this key for refresh token encryption.\n* To get the values for `` and ``, see [Compute settings for the Databricks JDBC Driver](https://docs.databricks.com/integrations/jdbc/compute.html). \nFor more information, see the `Using Browser Based Authentication` section in the [Databricks JDBC Driver Guide](https://docs.databricks.com/_extras/documents/Databricks-JDBC-Driver-Install-and-Configuration-Guide.pdf).\n\n", "chunk_id": "a11aa6af7d6bab3ff2ea9b0c3d99c0b6", "url": "https://docs.databricks.com/integrations/jdbc/authentication.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### Use a SQL connector\n#### driver\n##### or API\n###### Databricks ODBC and JDBC Drivers\n####### Databricks JDBC Driver\n######### Authentication settings for the Databricks JDBC Driver\n########## OAuth machine-to-machine (M2M) authentication\n\nJDBC driver 2.6.36 and above supports OAuth machine-to-machine (M2M) authentication for a Databricks service principal. This is also known as OAuth 2.0 *client credentials* authentication. \nNote \nJDBC does not currently connect using M2M for private link workpaces. \nTo configure OAuth M2M or OAuth 2.0 client credentials authentication, do the following: \n1. Create a Databricks service principal in your Databricks workspace, and create an OAuth secret for that service principal. \nTo create the service principal and its OAuth secret, see [OAuth machine-to-machine (M2M) authentication](https://docs.databricks.com/dev-tools/auth/oauth-m2m.html). Make a note of the service principal\u2019s **UUID** or **Application ID** value, and the **Secret** value for the service principal\u2019s OAuth secret.\n2. Give the service principal access to your cluster or warehouse. See [Compute permissions](https://docs.databricks.com/compute/clusters-manage.html#cluster-level-permissions) or [Manage a SQL warehouse](https://docs.databricks.com/compute/sql-warehouse/create.html#manage). \nTo authenticate using OAuth machine-to-machine (M2M) or OAuth 2.0 client credentials authentication, set the following configuration. \nFor a JDBC connection URL with embedded general configuration properties and sensitive credential properties: \n```\njdbc:databricks://:443;httpPath=;AuthMech=11;Auth_Flow=1;OAuth2ClientId=;OAuth2Secret=\n\n``` \nFor Java code with general configuration properties and sensitive credential properties set outside of the JDBC connection URL: \n```\n// ...\nString url = \"jdbc:databricks://:443\";\nProperties p = new java.util.Properties();\np.put(\"httpPath\", \"\");\np.put(\"AuthMech\", \"11\");\np.put(\"Auth_Flow\", \"1\");\np.put(\"OAuth2ClientId\", \"\");\np.put(\"OAuth2Secret\", \"\");\n// ...\nConnection conn = DriverManager.getConnection(url, p);\n// ...\n\n``` \n* For a complete Java code example that you can adapt the preceding code snippet to you own needs, see the code example at the beginning of this article.\n* In the preceding URL or Java code, replace the following placeholders: \n+ Replace `` with the service principal\u2019s **UUID**/**Application ID** value.\n+ Replace `` with the service principal\u2019s OAuth **Secret** value.\n+ To get the values for `` and ``, see [Compute settings for the Databricks JDBC Driver](https://docs.databricks.com/integrations/jdbc/compute.html). \nFor more information, see the `Using M2M Based Authentication` section in the [Databricks JDBC Driver Guide](https://docs.databricks.com/_extras/documents/Databricks-JDBC-Driver-Install-and-Configuration-Guide.pdf).\n\n", "chunk_id": "8642b2c68b6d336a2283a648342068a5", "url": "https://docs.databricks.com/integrations/jdbc/authentication.html"} +{"chunked_text": "# Databricks documentation archive\n### Best practices: Compute policies\n\nWarning \nThis article has been archived and may no longer reflect the current state of the product. For information about compute policies, see [Create and manage compute policies](https://docs.databricks.com/admin/clusters/policies.html). \nDatabricks [compute policies](https://docs.databricks.com/admin/clusters/policies.html) provide administrators control over the creation of compute resources in a Databricks workspace. Effective use of compute policies allows administrators to: \n* Enforce standardized compute configurations.\n* Prevent excessive use of resources and control spending.\n* Ensure accurate chargeback by correctly tagging compute resources.\n* Facilitate analysis and processing by providing users with pre-configured compute configurations targeted at specific workloads. \nCombined with effective onboarding, approval, and chargeback processes, compute policies can be a foundational component in Databricks platform governance. This guide presents recommendations and best practices to help you create a successful plan for integrating compute policies into your governance framework. \nSince governance is unique to each organization\u2019s requirements and existing governance infrastructure, this article begins by covering recommendations that apply commonly to compute policies. The last section of this article discusses specific strategies to address challenges you might see in your environment. \nThis article discusses the following best practices and recommendations to ensure a successful compute governance rollout: \n* Create a plan for introducing compute policies in phases to help users transition to a governed environment.\n* Create a plan for communicating changes for each phase of the compute policies rollout.\n* Identify compute governance challenges and implement strategies to address those challenges.\n\n", "chunk_id": "68682ee66f5e14cc3d73ca6596a77ee4", "url": "https://docs.databricks.com/archive/compute/policies-best-practices.html"} +{"chunked_text": "# Databricks documentation archive\n### Best practices: Compute policies\n#### Compute policies rollout\n\nImplementing compute policies can present a significant change to the user experience. Databricks recommends a phased approach to help guide users through the transition: \n* Communicate the upcoming changes and provide users an opportunity to test compute configurations.\n* Perform a soft rollout.\n* Incrementally introduce further policy changes.\n* Perform a hard cutover to an entirely governed environment. \nA phased rollout allows users to familiarize themselves with the new policies and prevent disruption to existing workloads. The following diagram is an example of this recommended process: \n![Compute policies rollout plan](https://docs.databricks.com/_images/policies-rollout-stages.png) \nThe following sections provide more detailed information on these stages: \n* [Communicate and test compute policies](https://docs.databricks.com/archive/compute/policies-best-practices.html#communicate-and-test-compute-policies)\n* [Considerations for introducing compute policies](https://docs.databricks.com/archive/compute/policies-best-practices.html#considerations-for-introducing-compute-policies)\n* [Final rollout](https://docs.databricks.com/archive/compute/policies-best-practices.html#final-rollout) \n### [Communicate and test compute policies](https://docs.databricks.com/archive/compute/policies-best-practices.html#id7) \nBegin the process by communicating the upcoming changes to users. The communication plan should include: \n* Details on changes that are coming.\n* Why these changes are happening.\n* What users will need to do to ensure successful transitioning of workloads.\n* How to provide feedback about the changes.\n* A timeline for each stage of the rollout.\n* At the start of each stage of the phased rollout, communicate further details relevant to that stage. \nThe following diagram provides an example communication plan for a phased rollout: \n![Compute policies communication plan](https://docs.databricks.com/_images/policies-communication-plan.png) \nYour plan might have different stages depending on your environment and compute policies strategy. This example includes four stages: \n* Stage 1 includes communicating the plan to users and beginning of testing. Users must have an opportunity to test their current and anticipated workloads on compute that conform to the new policies. You want to identify any issues with existing and planned workloads early in the process.\n* Stage 2 continues testing along with the rollout of a compute tagging policy.\n* Stage 3 introduces compute types, in this case specifying compute using T-shirt sizes, for example, small, large, or extra-large compute types.\n* Stage 4 is the final rollout of compute policies along with complete user documentation. \nUsers should also have the opportunity to test their workloads with the planned compute configurations in the initial stage. This testing can help identify existing workloads that have issues running with the proposed policies. \n### [Considerations for introducing compute policies](https://docs.databricks.com/archive/compute/policies-best-practices.html#id8) \nConsider your current management policies when planning the initial deployment of compute policies. In particular, consider whether you\u2019re moving from an environment where users are restricted from creating compute or a more open environment. \n#### Restrictive environment \nIn the case of an environment where users haven\u2019t had permissions to create compute, begin by rolling out restrictive policies along with an enablement plan for users. An enablement plan might be computer-based training, workshops, or documentation. Providing users with guidance on best practices for configuring compute will improve their ability to take full advantage of the platform. Policies can be relaxed as users demonstrate compliance and competence with the platform. \n#### Unrestricted environment \nApplying policies can be more challenging in an unrestricted environment. Some existing use cases and compute will nearly always fall outside of the new policy\u2019s constraints, so identifying these in a testing or soft rollout stage is crucial. \nUsers with compute create permissions or access to the unrestricted policy will maintain their access to this policy throughout the soft rollout to ensure all workloads continue to function. Users should use the soft rollout to test all of their workloads with the new policies that will be made available to them. \nBe sure to give users a place to submit feedback about the policies. Work with users to refine the policies or define new policies when issues arise. \n### [Final rollout](https://docs.databricks.com/archive/compute/policies-best-practices.html#id9) \nRemove access to the unrestricted policies for restricted users when the deadline is reached. The rollout of compute policies should now be complete.\n\n", "chunk_id": "7c12ee53608f1e7d414cbb9f528f03a9", "url": "https://docs.databricks.com/archive/compute/policies-best-practices.html"} +{"chunked_text": "# Databricks documentation archive\n### Best practices: Compute policies\n#### Specific challenges & strategies\n\nThe following are examples of applying compute policies to address specific challenges. Many of these strategies can be employed simultaneously but will require application of each strategy across all policies. For example, if using the tag enforcement strategy with the T-shirt size strategy, each T-shirt policy will also need a `custom_tag.*` policy. \n### Tag enforcement \n#### Challenge \nUsers can create compute freely, and there is no mechanism to enforce that they apply required tags. \n#### Solution \n1. Revoke [compute create permission](https://docs.databricks.com/security/auth-authz/entitlements.html) from users.\n2. Add a compute tag rule to any applicable compute policies. To add the compute tag rule to a policy, use the `custom_tags.` attribute. The value can be anything under an [unlimited policy](https://docs.databricks.com/admin/clusters/policy-definition.html#unlimited), or it can be restricted by [fixed](https://docs.databricks.com/admin/clusters/policy-definition.html#fixed), [allow list](https://docs.databricks.com/admin/clusters/policy-definition.html#allow), [block list](https://docs.databricks.com/admin/clusters/policy-definition.html#block), [regex](https://docs.databricks.com/admin/clusters/policy-definition.html#regex), or [range](https://docs.databricks.com/admin/clusters/policy-definition.html#range) policies. For example, to ensure correct chargeback and cost attribution, enforce a `COST_CENTER` tag on each policy restricted to a list of allowed cost center values: \n```\n{\"custom_tags.COST_CENTER\": {\"type\":\"allowlist\", \"values\":[\"9999\", \"9921\", \"9531\" ]}}\n\n``` \nAny user using this policy will have to fill in a `COST_CENTER` tag with 9999, 9921, or 9531 for the compute to launch.\n3. Assign the policy to users who should be able to charge against those three cost centers. Policies can be assigned at a user or group level through the [compute policy UI](https://docs.databricks.com/admin/clusters/policies.html#manage-policy) or the [Policies API](https://docs.databricks.com/api/workspace/clusterpolicies). The following example request body assigns a policy to the sales department: \n```\n{\n\"access_control_list\": [\n{\n\"user_name\": \"user@mydomain.com\",\n\"all_permissions\": [\n{\n\"permission_level\": \"CAN_USE\"\n}\n]\n},\n{\n\"group_name\": \"sales\",\n\"all_permissions\": [\n{\n\"permission_level\": \"CAN_USE\"\n}\n]\n}\n]\n}\n\n``` \n### Inexperienced users \n#### Challenge \nUsers are unfamiliar with compute or cloud infrastructure provisioning or overwhelmed with compute creation options. \n#### Solution \nUse compute policies to define \u201cT-shirt\u201d sized compute configurations, for example, small, medium, or large compute . \n1. Create a policy for each T-Shirt size. T-shirt size policies indicate a relative compute size to the users and can either be flexible templates or zero option configurations. Zero option or low option policies will often have [fixed](https://docs.databricks.com/admin/clusters/policy-definition.html#fixed) and hidden policy rules. The following example defines a policy with a fixed value of DBR 7.3 for the `spark_version`. Setting the `hidden` flag to true will ensure this option is not visible to users. \n```\n{\"spark_version\": { \"type\": \"fixed\", \"value\": \"auto:latest-ml\", \"hidden\": true }}\n\n``` \nWhen defining flexible templates, you can use [range](https://docs.databricks.com/admin/clusters/policy-definition.html#range), [blocklist](https://docs.databricks.com/admin/clusters/policy-definition.html#block), [regex](https://docs.databricks.com/admin/clusters/policy-definition.html#regex), and [unlimited policy](https://docs.databricks.com/admin/clusters/policy-definition.html#unlimited) policies to set upper boundaries, non-optional fields, and semi-restricted policy elements. The following example defines a policy that enables autoscaling nodes to a maximum of 25. You can use this definition to set upper boundaries on each T-Shirt size while providing some flexibility. To see more details of a compute template approach, see [Excessive resource usage](https://docs.databricks.com/archive/compute/policies-best-practices.html#excessive-resources). \n```\n{\"autoscale.max_workers\": { \"type\": \"range\", \"maxValue\": \"25\", \"defaultValue\": 5}}\n\n```\n2. Assign the policy to users who should be allowed to create T-shirt sized compute . Policies can be assigned at a user or a group level through the policy UI or the Policy Permissions API. For example, to assign this policy to all users through the UI: \n1. Go to the policy and select **Edit**.\n2. Select the **Permissions** tab.\n3. Select the **all users** option under **Groups** in the dropdown. \n![Assign policy to all users](https://docs.databricks.com/_images/policies-edit-policy-assign-all-users.png)\n3. Revoke access to the unrestricted policy from the groups that must use these new policies only. Once compute policies are in use, having access to the \u201ccompute creation\u201d permission gives users access to the unrestricted policy. It\u2019s important to revoke this permission for users that should not have it. \nTo revoke compute creation permissions, see [Configure compute creation permission](https://docs.databricks.com/security/auth-authz/entitlements.html). \n### Use case specific policies \n#### Challenge \nSome workloads or analyses are incompatible with existing policies, or users do not know the correct compute configuration for certain workload types. \n#### Solution \nIf you find workloads that don\u2019t work well with existing policies, it\u2019s often better to create new policies specifically targeted at those workloads instead of expanding existing policies. \nTo help users create compute using these policies, it can help to create policies tuned for specific use cases. Assign descriptive names to these policies to help users identify them. For example, if workloads will be querying a data source that supports predicate pushdown, a best practice is to build a specific policy that enforces autoscaling with a low or zero worker minimum. This policy will ensure that cloud provider and Databricks costs don\u2019t unnecessarily grow while waiting for the data source to compute the pushed down components of the query. \n1. Create a policy that enforces use case-specific best practices. This example defines a policy that has a fixed value of `0` for the minimum number of workers. This policy also enforces that the compute will autoscale, satisfying the predicate pushdown example\u2019s best practice. \n```\n{\"autoscale.min_workers\": { \"type\": \"fixed\", \"value\": \"0\", \"hidden\": false }}\n\n```\n2. Assign the policy to users who need to build compute for these use cases. You can assign policies at a user or a group level through the [policy UI](https://docs.databricks.com/admin/clusters/policies.html#manage-policy) or the [Permissions API](https://docs.databricks.com/api/workspace/permissions). For example, to assign this policy to a data scientist group through the UI: \n1. Go to the policy and select **Edit**.\n2. Select the **Permissions** tab.\n3. To assign a policy to a specific team, select the team\u2019s name in the **Select User or Group** dropdown. \n![Assign policy to a group](https://docs.databricks.com/_images/policies-edit-policy-assign-group.png) \n### Excessive resource usage \n#### Challenge \nUsers are creating unnecessarily large compute, consuming excessive and expensive resources. This is often caused by: \n* Failure to activate autoscaling.\n* Incorrect usage of auto termination windows.\n* High minimum worker node counts.\n* Expensive instance types. \n#### Solution \nPairing compute policies with an internal approval process will enable control over resources while also providing access to large compute resources when necessary. \n1. Establish a review process for granting access to larger or more flexible policies. The review process should have an intake form that collects information that supports the need for larger or more flexible compute configurations. The platform ownership team should evaluate this information to decide how to support the workload requirements. The following diagram illustrates an example approval process using T-shirt sizing: \n![Policies sizing process](https://docs.databricks.com/_images/policies-sizing-process.png)\n2. Create more flexible policies with fewer constraints and a focus on controlling governance items like tags. An example of a flexible Policy: \n```\n{\n\"autoscale.min_workers\": {\n\"type\": \"range\",\n\"maxValue\": 20,\n\"defaultValue\": 2\n},\n\"autoscale.max_workers\": {\n\"type\": \"range\",\n\"maxValue\": 100,\n\"defaultValue\": 8\n},\n\"autotermination_minutes\": {\n\"type\": \"range\",\n\"maxValue\": 120,\n\"defaultValue\": 60\n},\n\"node_type_id\": {\n\"type\": \"blocklist\",\n\"values\": [\"z1d.12xlarge\", \"z1d.6xlarge\", \"r5d.16xlarge\", \"r5a.24xlarge\", \"i4i.32xlarge\"],\n\"defaultValue\": \"i3.xlarge\"\n},\n\"driver_node_type_id\": {\n\"type\": \"blocklist\",\n\"values\": [\"z1d.12xlarge\", \"z1d.6xlarge\", \"r5d.16xlarge\", \"r5a.24xlarge\", \"i4i.32xlarge\"],\n\"defaultValue\": \"i3.xlarge\"\n},\n\"spark_version\": {\n\"type\": \"fixed\",\n\"value\": \"auto:latest-ml\",\n\"hidden\": true\n},\n\"enable_elastic_disk\": {\n\"type\": \"fixed\",\n\"value\": true,\n\"hidden\": true\n},\n\"custom_tags.team\": {\n\"type\": \"fixed\",\n\"value\": \"product\"\n}\n}\n\n``` \n1. Document the upgrade and approval process and share it with users. It is also helpful to publish guidance on identifying the types of workloads that might need more flexibility or larger compute.\n2. Once a user is approved, assign the policy to them. Policies can be assigned at a user or a group level through the [policy UI](https://docs.databricks.com/admin/clusters/policies.html#manage-policy) or by submitting a request to the [Permissions API](https://docs.databricks.com/api/workspace/permissions): \n```\n{\n\"access_control_list\": {\n\"user_name\": \"users_email@yourdomain.com\",\n\"permission_level\": \"CAN_USE\"\n}\n}\n\n```\n\n", "chunk_id": "a8c69f263f7bbabcbbc2092072ad5b80", "url": "https://docs.databricks.com/archive/compute/policies-best-practices.html"} +{"chunked_text": "# Databricks documentation archive\n### Best practices: Compute policies\n#### Learn more\n\nTo learn more about compute policies on Databricks, see [Create and manage compute policies](https://docs.databricks.com/admin/clusters/policies.html) and our blog on compute policies: [Allow Simple Cluster Creation with Full Admin Control Using Cluster Policies](https://databricks.com/blog/2020/07/02/allow-simple-cluster-creation-with-full-admin-control-using-cluster-policies.html).\n\n", "chunk_id": "4b82c745ff9d5dd77b9b16f7998c34f5", "url": "https://docs.databricks.com/archive/compute/policies-best-practices.html"} +{"chunked_text": "# Databricks administration introduction\n## Create a workspace\n#### Manually create a workspace (new Databricks accounts)\n\nNote \nThese instructions apply to accounts created after November 8, 2023. If your Databricks account was created before November 8, 2023, see [Manually create a workspace (existing Databricks accounts)](https://docs.databricks.com/admin/workspace/create-workspace.html). \nThis article describes how to manually create a workspace using the account console and custom AWS configurations. You can use this process if you want to create your own AWS resources or need to deploy a workspace in your own VPC. If you don\u2019t need to create custom configurations for your deployment, Databricks recommends you create workspaces using the [AWS Quick Start template](https://docs.databricks.com/admin/workspace/quick-start.html). \nYou can also create workspaces using the [Account API](https://docs.databricks.com/admin/workspace/create-workspace-api.html) or [Terraform](https://docs.databricks.com/dev-tools/terraform/e2-workspace.html).\n\n", "chunk_id": "91d22c9ea673773caee13b5cf1d20f88", "url": "https://docs.databricks.com/admin/workspace/create-uc-workspace.html"} +{"chunked_text": "# Databricks administration introduction\n## Create a workspace\n#### Manually create a workspace (new Databricks accounts)\n##### Create a workspace with custom AWS configurations\n\n1. Go to the [account console](https://docs.databricks.com/admin/account-settings/index.html#account-console) and click the **Workspaces** icon.\n2. Click **Create Workspace**, then **Manual**.\n3. In the **Workspace name** field, enter a human-readable name for this workspace. It can contain spaces.\n4. In the **Region** field, select an AWS region for your workspace\u2019s network and clusters. Click **Next**.\n5. In the **Storage configuration** field, select or create a storage configuration. If you create a new storage credential, follow the steps listed in the UI. For more on storage configurations, see [Create an S3 bucket for workspace deployment](https://docs.databricks.com/admin/account-settings-e2/storage.html).\n6. Click **Next**.\n7. In the **Credential configuration** field, select or create the credential configuration you\u2019ll use for this workspace. To create a new credential configuration, follow the steps listed in the UI (also available in [Create an IAM role for workspace deployment](https://docs.databricks.com/admin/account-settings-e2/credentials.html)).\n8. (Optional) Set up any **Advanced configurations**. See [Advanced configurations](https://docs.databricks.com/admin/workspace/create-uc-workspace.html#advanced).\n9. Click **Next**.\n10. Review your workspace details and click **Create workspace**.\n\n", "chunk_id": "f35bc97adb858ced75f7d2f5f4e9234a", "url": "https://docs.databricks.com/admin/workspace/create-uc-workspace.html"} +{"chunked_text": "# Databricks administration introduction\n## Create a workspace\n#### Manually create a workspace (new Databricks accounts)\n##### Advanced configurations\n\nThe following configurations are optional when creating a new workspace. \n### Select a metastore \nConfirm the metastore assignment for your workspace. \n### Create workspace in your own VPC \nTo create the workspace in your own VPC, select or add a **Network configuration**. For instructions on configuring your own VPC, see [Configure a customer-managed VPC](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html). \nImportant \nIf you are using a customer-managed VPC, ensure your IAM role uses an [access policy that supports customer-managed VPCs](https://docs.databricks.com/admin/account-settings-e2/credentials.html#step-2). \n### Enable PrivateLink \nTo enable PrivateLink, select or add a [private access setting](https://docs.databricks.com/security/network/classic/private-access-settings.html) under **Private Link**. \nTo enable PrivateLink, you must also have created the correct regional VPC endpoints, registered them, and referenced them from your network configuration. For more guidance, see [Enable AWS PrivateLink](https://docs.databricks.com/security/network/classic/privatelink.html). \n### Add encryption key for managed services \nYou can add an encryption key to your workspace deployment that would encrypt notebooks, secrets, and Databricks SQL query data in the control plane. You can also rotate this key after your workspace is deployed.\n\n", "chunk_id": "c85d291c288c7a7a0651aab2a9e25798", "url": "https://docs.databricks.com/admin/workspace/create-uc-workspace.html"} +{"chunked_text": "# Databricks administration introduction\n## Create a workspace\n#### Manually create a workspace (new Databricks accounts)\n##### View workspace status\n\nAfter you create a workspace, you can view its status on the **Workspaces** page. \n* **Provisioning**:In progress. Wait a few minutes and refresh the page.\n* **Running**:Successful workspace deployment.\n* **Failed**:Failed deployment.\n* **Banned**: Contact your Databricks representative.\n* **Cancelling**:In the process of cancellation. \nIf the status for your new workspace is **Failed**, click the workspace to view a detailed error message. You can make updates to the configuration and try to deploy the workspace again. See [Troubleshooting creating workspaces](https://docs.databricks.com/admin/workspace/troubleshooting.html).\n\n#### Manually create a workspace (new Databricks accounts)\n##### Log into a workspace\n\n1. Go to the [account console](https://docs.databricks.com/admin/account-settings/index.html#account-console) and click the **Workspaces** icon.\n2. On the row with your workspace, click **Open**.\n3. To log in as a workspace administrator, log in with your account owner or account administrator email address and password. If you configured [single-sign on (SSO)](https://docs.databricks.com/admin/users-groups/single-sign-on/index.html), click the **Single Sign On** button.\n\n", "chunk_id": "eb60ec635a07cd1ee6ba7ebabf43e01a", "url": "https://docs.databricks.com/admin/workspace/create-uc-workspace.html"} +{"chunked_text": "# Databricks administration introduction\n## Create a workspace\n#### Manually create a workspace (new Databricks accounts)\n##### Next steps\n\nNow that you have deployed a workspace, you can start building out your data strategy. Databricks recommends the following articles: \n* Add users, groups, and service principals to your workspace. [Manage users, service principals, and groups](https://docs.databricks.com/admin/users-groups/index.html).\n* Learn about data governance and managing data access in Databricks. See [What is Unity Catalog?](https://docs.databricks.com/data-governance/unity-catalog/index.html).\n* Connect your Databricks workspace to your external data sources. See [Connect to data sources](https://docs.databricks.com/connect/index.html).\n* Ingest your data into the workspace. See [Ingest data into a Databricks lakehouse](https://docs.databricks.com/ingestion/index.html).\n* Learn about managing access to workspace objects like notebooks, compute, dashboards, queries. See [Access control lists](https://docs.databricks.com/security/auth-authz/access-control/index.html).\n\n", "chunk_id": "7de6f78e050145610bc36f277d576ca5", "url": "https://docs.databricks.com/admin/workspace/create-uc-workspace.html"} +{"chunked_text": "# Security and compliance guide\n## Auditing\n### privacy\n#### and compliance\n##### Compliance security profile\n####### IRAP compliance controls\n\nPreview \nThe ability for admins to add Enhanced Security and Compliance features is a feature in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). The compliance security profile and support for compliance standards are generally available (GA). \nIRAP compliance controls provide enhancements that help you with Infosec Registered Assessors Program (IRAP) compliance for your workspace. \nIRAP provides high-quality information and communications technology (ICT) security assessment services to the Australian government. IRAP provides a framework for assessing the implementation and effectiveness of an organization\u2019s security controls against the Australian government\u2019s security requirements. Databricks is IRAP certified. \nIRAP compliance controls require enabling the *compliance security profile*, which adds monitoring agents, enforces instance types for inter-node encryption, provides a hardened compute image, and other features. For technical details, see [Compliance security profile](https://docs.databricks.com/security/privacy/security-profile.html). It is your responsibility to [confirm that each affected workspace has the compliance security profile enabled](https://docs.databricks.com/security/privacy/security-profile.html#verify) and confirm that IRAP is added as a compliance program. \nIRAP compliance controls are only available in the `ap-southeast-2` region.\n\n####### IRAP compliance controls\n######## Which compute resources get enhanced security\n\nThe compliance security profile enhancements apply to compute resources in the [classic compute plane](https://docs.databricks.com/getting-started/overview.html) in all regions. \nSupport for serverless SQL warehouses for the compliance security profile varies by region and it is supported in the `ap-southeast-2` region. See [Serverless SQL warehouses support the compliance security profile in some regions](https://docs.databricks.com/admin/sql/serverless.html#security-profile).\n\n", "chunk_id": "aac3dbb55428727b72bb806279f08f61", "url": "https://docs.databricks.com/security/privacy/irap.html"} +{"chunked_text": "# Security and compliance guide\n## Auditing\n### privacy\n#### and compliance\n##### Compliance security profile\n####### IRAP compliance controls\n######## Requirements\n\n* Your Databricks account must include the Enhanced Security and Compliance add-on. For details, see the [pricing page](https://databricks.com/product/pricing/platform-addons).\n* Your Databricks workspace is in the `ap-southeast-2` region.\n* Your Databricks workspace is on the Enterprise tier.\n* [Single sign-on (SSO)](https://docs.databricks.com/admin/account-settings-e2/single-sign-on/index.html) authentication is configured for the workspace.\n* Your workspace enables the [compliance security profile](https://docs.databricks.com/security/privacy/security-profile.html) and adds the IRAP compliance standard as part of the compliance security profile configuration.\n* You must use the following VM instance types: \n+ **General purpose:** `M-fleet`, `Md-fleet`, `M5dn`, `M5n`, `M5zn`, `M7g`, `M7gd`, `M6i`, `M7i`, `M6id`, `M6in`, `M6idn`, `M6a`, `M7a`\n+ **Compute optimized:** `C5a`, `C5ad`, `C5n`, `C6gn`, `C7g`, `C7gd`, `C7gn`, `C6i`, `C6id`, `C7i`, `C6in`, `C6a`, `C7a`\n+ **Memory optimized:** `R-fleet`, `Rd-fleet`, `R7g`, `R7gd`, `R6i`, `R7i`, `R7iz`, `R6id`, `R6in`, `R6idn`, `R6a`, `R7a`\n+ **Storage optimized:** `D3`, `D3en`, `P3dn`, `R5dn`, `R5n`, `I4i`, `I4g`, `I3en`, `Im4gn`, `Is4gen`\n+ **Accelerated computing:** `G4dn`, `G5`, `P4d`, `P4de`, `P5`\n* Ensure that sensitive information is never entered in customer-defined input fields, such as workspace names, cluster names, and job names.\n\n", "chunk_id": "976bf089cf641427a87a08a84a0f70f1", "url": "https://docs.databricks.com/security/privacy/irap.html"} +{"chunked_text": "# Security and compliance guide\n## Auditing\n### privacy\n#### and compliance\n##### Compliance security profile\n####### IRAP compliance controls\n######## Enable IRAP compliance controls\n\nTo configure your workspace to support processing of data regulated by the IRAP standard, the workspace must have the [compliance security profile](https://docs.databricks.com/security/privacy/security-profile.html) enabled. You can enable the compliance security profile and add the PCI-DSS compliance standard across all workspaces or only on some workspaces. \nTo enable the compliance security profile and add the IRAP compliance standard for an existing workspace, see [Enable enhanced security and compliance features on a workspace](https://docs.databricks.com/security/privacy/enhanced-security-compliance.html#aws-workspace-config). To set an account-level setting to enable the compliance security profile and IRAP for new workspaces, see [Set account-level defaults for new workspaces](https://docs.databricks.com/security/privacy/enhanced-security-compliance.html#aws-account-level-defaults).\n\n", "chunk_id": "267903011ae9675ad841607e2df21287", "url": "https://docs.databricks.com/security/privacy/irap.html"} +{"chunked_text": "# Security and compliance guide\n## Auditing\n### privacy\n#### and compliance\n##### Compliance security profile\n####### IRAP compliance controls\n######## Preview features that are supported for processing data under the IRAP Protected standard\n\nThe following preview features are supported for processing of processing data regulated under IRAP Protected standard: \n* [SCIM provisioning](https://docs.databricks.com/admin/users-groups/scim/index.html)\n* [IAM passthrough](https://docs.databricks.com/archive/credential-passthrough/iam-passthrough.html)\n* [Secret paths in environment variables](https://docs.databricks.com/security/secrets/secrets.html#spark-conf-env-var)\n* [System tables](https://docs.databricks.com/admin/system-tables/index.html)\n* [Serverless SQL warehouse usage when compliance security profile is enabled](https://docs.databricks.com/admin/sql/serverless.html#security-profile), with support in some regions\n* [Filtering sensitive table data with row filters and column masks](https://docs.databricks.com/data-governance/unity-catalog/row-and-column-filters.html)\n* [Unified login](https://docs.databricks.com/admin/account-settings-e2/single-sign-on/index.html#unified-login)\n* [Lakehouse Federation to Redshift](https://docs.databricks.com/query-federation/redshift.html)\n* [Liquid clustering for Delta tables](https://docs.databricks.com/delta/clustering.html)\n* [Unity Catalog-enabled DLT pipelines](https://docs.databricks.com/delta-live-tables/unity-catalog.html)\n* [Databricks Assistant](https://docs.databricks.com/notebooks/databricks-assistant-faq.html)\n* Scala support for shared clusters\n* Delta Live Tables Hive metastore to Unity Catalog clone API\n\n", "chunk_id": "559d1eb08437151f9ca53c413780aa5a", "url": "https://docs.databricks.com/security/privacy/irap.html"} +{"chunked_text": "# Security and compliance guide\n## Auditing\n### privacy\n#### and compliance\n##### Compliance security profile\n####### IRAP compliance controls\n######## Does Databricks permit the processing of data regulated under IRAP Protected standard?\n\nYes, if you comply with the [requirements](https://docs.databricks.com/security/privacy/irap.html#requirements), enable the compliance security profile, and add the IRAP compliance standard as part of the compliance security profile configuration.\n\n", "chunk_id": "c6efe808d4cdd50e9851db2c89b72e86", "url": "https://docs.databricks.com/security/privacy/irap.html"} +{"chunked_text": "# Develop on Databricks\n## Databricks for R developers\n#### Comparing SparkR and sparklyr\n\nR users can choose between two APIs for Apache Spark: [SparkR](https://spark.apache.org/docs/latest/sparkr.html) and [sparklyr](https://spark.rstudio.com/). This article compares these APIs. Databricks recommends that you choose one of these APIs to develop a Spark application in R. Combining code from both of these APIs into a single script or Databricks notebook or job can make your code more difficult to read and maintain.\n\n#### Comparing SparkR and sparklyr\n##### API origins\n\n[SparkR](https://spark.apache.org/docs/latest/sparkr.html) is built by the Spark community and developers from Databricks. Because of this, SparkR closely follows the Spark [Scala classes](https://api-docs.databricks.com/scala/spark/latest/org/apache/spark/index.html) and [DataFrame API](https://spark.apache.org/docs/latest/sql-getting-started.html#creating-dataframes). \n[sparklyr](https://spark.rstudio.com/) started with [RStudio](https://www.rstudio.com/) and has since been donated to the Linux Foundation. sparklyr is tightly integrated into the [tidyverse](https://www.tidyverse.org/) in both its programming style and through API interoperability with [dplyr](https://dplyr.tidyverse.org/). \nSparkR and sparklyr are highly capable of working with big data in R. Within the past few years, their feature sets are coming closer to parity.\n\n", "chunk_id": "a88543aee6145b651d31cb44b76d1440", "url": "https://docs.databricks.com/sparkr/sparkr-vs-sparklyr.html"} +{"chunked_text": "# Develop on Databricks\n## Databricks for R developers\n#### Comparing SparkR and sparklyr\n##### API differences\n\nThe following code example shows how to use SparkR and sparklyr from a Databricks notebook to read a CSV file from the [Sample datasets](https://docs.databricks.com/discover/databricks-datasets.html) into Spark. \n```\n# #############################################################################\n# SparkR usage\n\n# Note: To load SparkR into a Databricks notebook, run the following:\n\n# library(SparkR)\n\n# You can then remove \"SparkR::\" from the following function call.\n# #############################################################################\n\n# Use SparkR to read the airlines dataset from 2008.\nairlinesDF <- SparkR::read.df(path = \"/databricks-datasets/asa/airlines/2008.csv\",\nsource = \"csv\",\ninferSchema = \"true\",\nheader = \"true\")\n\n# Print the loaded dataset's class name.\ncat(\"Class of SparkR object: \", class(airlinesDF), \"\\n\")\n\n# Output:\n#\n# Class of SparkR object: SparkDataFrame\n\n# #############################################################################\n# sparklyr usage\n\n# Note: To install, load, and connect with sparklyr in a Databricks notebook,\n# run the following:\n\n# install.packages(\"sparklyr\")\n# library(sparklyr)\n# sc <- sparklyr::spark_connect(method = \"databricks\")\n\n# If you run \"library(sparklyr)\", you can then remove \"sparklyr::\" from the\n# preceding \"spark_connect\" and from the following function call.\n# #############################################################################\n\n# Use sparklyr to read the airlines dataset from 2007.\nairlines_sdf <- sparklyr::spark_read_csv(sc = sc,\nname = \"airlines\",\npath = \"/databricks-datasets/asa/airlines/2007.csv\")\n\n# Print the loaded dataset's class name.\ncat(\"Class of sparklyr object: \", class(airlines_sdf))\n\n# Output:\n#\n# Class of sparklyr object: tbl_spark tbl_sql tbl_lazy tbl\n\n``` \nHowever, if you try to run a sparklyr function on a `SparkDataFrame` object from SparkR, or if you try to run a SparkR function on a `tbl_spark` object from sparklyr, it will not work, as shown in the following code example. \n```\n# Try to call a sparklyr function on a SparkR SparkDataFrame object. It will not work.\nsparklyr::sdf_pivot(airlinesDF, DepDelay ~ UniqueCarrier)\n\n# Output:\n#\n# Error : Unable to retrieve a Spark DataFrame from object of class SparkDataFrame\n\n## Now try to call s Spark R function on a sparklyr tbl_spark object. It also will not work.\nSparkR::arrange(airlines_sdf, \"DepDelay\")\n\n# Output:\n#\n# Error in (function (classes, fdef, mtable) :\n# unable to find an inherited method for function \u2018arrange\u2019 for signature \u2018\"tbl_spark\", \"character\"\u2019\n\n``` \nThis is because sparklyr translates dplyr functions such as `arrange` into a SQL query plan that is used by SparkSQL. This is not the case with SparkR, which has functions for SparkSQL tables and Spark DataFrames. These behaviors are why Databricks does not recommended combining SparkR and sparklyr APIs in the same script, notebook, or job.\n\n", "chunk_id": "00a4bcae7430e807504a6676547fdcc7", "url": "https://docs.databricks.com/sparkr/sparkr-vs-sparklyr.html"} +{"chunked_text": "# Develop on Databricks\n## Databricks for R developers\n#### Comparing SparkR and sparklyr\n##### API interoperability\n\nIn rare cases where you cannot avoid combining the SparkR and sparklyr APIs, you can use SparkSQL as a kind of bridge. For instance, in this article\u2019s first example, sparklyr loaded the airlines dataset from 2007 into a table named `airlines`. You can use the SparkR `sql` function to query this table, for example: \n```\ntop10delaysDF <- SparkR::sql(\"SELECT\nUniqueCarrier,\nDepDelay,\nOrigin\nFROM\nairlines\nWHERE\nDepDelay NOT LIKE 'NA'\nORDER BY DepDelay\nDESC LIMIT 10\")\n\n# Print the class name of the query result.\ncat(\"Class of top10delaysDF: \", class(top10delaysDF), \"\\n\\n\")\n\n# Show the query result.\ncat(\"Top 10 airline delays for 2007:\\n\\n\")\nhead(top10delaysDF, 10)\n\n# Output:\n#\n# Class of top10delaysDF: SparkDataFrame\n#\n# Top 10 airline delays for 2007:\n#\n# UniqueCarrier DepDelay Origin\n# 1 AA 999 RNO\n# 2 NW 999 EWR\n# 3 AA 999 PHL\n# 4 MQ 998 RST\n# 5 9E 997 SWF\n# 6 AA 996 DFW\n# 7 NW 996 DEN\n# 8 MQ 995 IND\n# 9 MQ 994 SJT\n# 10 AA 993 MSY\n\n``` \nFor additional examples, see [Work with DataFrames and tables in R](https://docs.databricks.com/sparkr/dataframes-tables.html).\n\n", "chunk_id": "301e23869579b5522ad04d94aecd2999", "url": "https://docs.databricks.com/sparkr/sparkr-vs-sparklyr.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Manage VPC endpoint registrations\n\nThis article describes how to manage VPC endpoint registrations in the account console.\n\n##### Manage VPC endpoint registrations\n###### What is a VPC endpoint registration?\n\nThis article discusses how to create Databricks VPC endpoint registration objects, which is a Databricks configuration object wrapping the regional AWS VPC endpoint. You must register AWS VPC endpoints to enable [AWS PrivateLink](https://aws.amazon.com/privatelink). An AWS VPC endpoint represents a connection from one VPC to a PrivateLink service in another VPC. \nThis article does not contain all the information necessary to configure PrivateLink for your workspace. For all requirements and steps, see [Enable AWS PrivateLink](https://docs.databricks.com/security/network/classic/privatelink.html). \nOne of the PrivateLink requirements is to use a [customer-managed VPC](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html), which you register with Databricks to create a network configuration object. For PrivateLink back-end support, that network configuration object must reference your VPC endpoint registrations (your registered VPC endpoints). For more information about network configurations, see [Enable AWS PrivateLink](https://docs.databricks.com/security/network/classic/privatelink.html) and [Create network configurations for custom VPC deployment](https://docs.databricks.com/admin/account-settings-e2/networks.html). \nIf you have multiple workspaces that share the same customer-managed VPC, you can choose to share the AWS VPC endpoints. You can also share these VPC endpoints among multiple Databricks accounts, in which case register the AWS VPC endpoint in each Databricks account.\n\n", "chunk_id": "76e24a4500593023d6a92f9349e976ef", "url": "https://docs.databricks.com/security/network/classic/vpc-endpoints.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Manage VPC endpoint registrations\n###### Register a VPC endpoint\n\nNote \nThese instructions show you how to create the VPC endpoints from the **Cloud resources** page in the account console before you create a new workspace. You can also create the VPC endpoints in a similar way as part of the flow of creating or updating a new workspace and choosing **Register a VPC endpoint** from menus in the network configuration editor. See [Manually create a workspace (existing Databricks accounts)](https://docs.databricks.com/admin/workspace/create-workspace.html) and [Create network configurations for custom VPC deployment](https://docs.databricks.com/admin/account-settings-e2/networks.html). \n1. In the [account console](https://docs.databricks.com/admin/account-settings/index.html#account-console), click **Cloud resources**.\n2. Click **Network**.\n3. From the vertical navigation on the page, click **VPC endpoint registrations**.\n4. Click **Register a VPC endpoint**.\n5. In the **VPC endpoint registration name** field , type the human-readable name you\u2019d like for the new configuration. Databricks recommends including the region and the destination of this particular VPC endpoint. For example, if this is a VPC endpoint for back-end PrivateLink connectivity to the Databricks control plane secure cluster connectivity relay, you might name it something like `VPCE us-west-2 for SCC`.\n6. Choose the region. \nImportant \nThe region field must match your workspace region and the region of the AWS VPC endpoints that you are registering. However, Databricks validates this only during workspace creation (or during updating a workspace with PrivateLink), so it is critical that you carefully set the region in this step.\n7. In the **AWS VPC endpoint ID** field, paste the ID from the relevant AWS VPC endpoint.\n8. Click **Register new VPC endpoint**.\n\n", "chunk_id": "4efebc447f2bb05041c2c7f846775c34", "url": "https://docs.databricks.com/security/network/classic/vpc-endpoints.html"} +{"chunked_text": "# Security and compliance guide\n## Networking\n### Classic compute plane networking\n##### Manage VPC endpoint registrations\n###### Delete a VPC endpoint registration\n\nVPC endpoint registrations cannot be edited after creation. If the configuration has incorrect data or if you no longer need it, delete the VPC endpoint registration: \n1. In the [account console](https://docs.databricks.com/admin/account-settings/index.html#account-console), click **Cloud resources**.\n2. Click **Network**.\n3. From the vertical navigation on the page, click **VPC endpoint registrations**.\n4. On the row for the configuration, click the kebab menu ![Vertical Ellipsis](https://docs.databricks.com/_images/vertical-ellipsis.png) on the right, and select **Delete**.\n5. In the confirmation dialog, click **Confirm Delete**.\n\n", "chunk_id": "1a001d87e184beb96088799a1a73978e", "url": "https://docs.databricks.com/security/network/classic/vpc-endpoints.html"} +{"chunked_text": "# Technology partners\n### Connect to ingestion partners using Partner Connect\n\nPartner Connect offers the simplest way to connect your Databricks workspace to a data ingestion partner solution. You typically follow the steps in this article to connect to an ingestion partner solution using Partner Connect.\n\n### Connect to ingestion partners using Partner Connect\n#### Before you begin:\n\n* Confirm that you meet the [requirements](https://docs.databricks.com/partner-connect/index.html#requirements) for using Partner Connect.\n* See the appropriate partner connection guide. \nImportant \nYou might have to meet partner-specific requirements. You might also have to follow different steps than the steps in this article. This is because not all partner solutions are featured in Partner Connect, and because the connection experience can differ between partners in Partner Connect. \nTip \nIf you have an existing partner account, Databricks recommends that you log in to your partner account and connect to Databricks manually. This is because the connection experience in Partner Connect is optimized for new partner accounts.\n\n", "chunk_id": "6dac3a44cb55b5265045d54f66e0ce22", "url": "https://docs.databricks.com/partner-connect/ingestion.html"} +{"chunked_text": "# Technology partners\n### Connect to ingestion partners using Partner Connect\n#### Steps to connect to a data ingestion partner\n\nTo connect your Databricks workspace to a data ingestion partner solution, do the following: \n1. In the sidebar, click ![Partner Connect button](https://docs.databricks.com/_images/partner-connect.png) **Partner Connect**.\n2. Click the partner tile. \nNote \nIf the partner tile has a check mark icon inside it, an administrator has already used Partner Connect to connect the partner to your workspace. Skip to step 5. The partner uses the email address for your Databricks account to prompt you to sign in to your existing partner account.\n3. Select a catalog for the partner to write to, then click **Next**. \nNote \nIf a partner doesn\u2019t support Unity Catalog with Partner Connect, the workspace default catalog is used. If your workspace isn\u2019t Unity Catalog-enabled, `hive_metastore` is used. \nPartner Connect creates the following resources in your workspace: \n* A SQL warehouse named **`_ENDPOINT`** by default. You can change this default name before you click **Next**.\n* A Databricks [service principal](https://docs.databricks.com/admin/users-groups/service-principals.html) named **`_USER`**.\n* A Databricks [personal access token](https://docs.databricks.com/admin/users-groups/service-principals.html) that is associated with the **`_USER`** service principal.Partner Connect also grants the following privileges to the **`_USER`** service principal: \n* (Unity Catalog)`USE CATALOG`: Required to interact with objects in the selected catalog.\n* (Unity Catalog)`CREATE SCHEMA`: Required to interact with objects in the selected schema.\n* (Hive metastore) `USAGE`: Required to interact with objects in the Hive metastore.\n* (Hive metastore) `CREATE`: Grants the ability to create objects in the Hive metastore.\n4. Click **Next**. \nThe **Email** box displays the email address for your Databricks account. The partner uses this email address to prompt you to either create a new partner account or sign in to your existing partner account.\n5. Click **Connect to ``** or **Sign in**. \nA new tab opens in your web browser, which displays the partner website.\n6. Complete the on-screen instructions on the partner website to create your trial partner account or sign in to your existing partner account.\n\n", "chunk_id": "c42bf238240df4be75f2ed165378f188", "url": "https://docs.databricks.com/partner-connect/ingestion.html"} +{"chunked_text": "# Technology partners\n## Connect to data governance partners using Partner Connect\n#### Connect Databricks to Monte Carlo\n\nThis article describes how to connect your Databricks workspace to Monte Carlo. Monte Carlo monitors your data across your data warehouses, data lakes, ETL pipelines, and business intelligence tools and alerts for issues.\n\n#### Connect Databricks to Monte Carlo\n##### Connect to Monte Carlo using Partner Connect\n\n### Before you connect using Partner Connect \nBefore you connect to Monte Carlo using Partner Connect, review the following requirements and considerations: \n* You must be a Databricks workspace admin.\n* You must belong to the `Account Owners` authorization group for your Monte Carlo account.\n* Any workspace admin can delete a Monte Carlo connection from Partner Connect, but, only users who have Monte Carlo `Account Owner` permissions can delete the associated connection object in the Monte Carlo account. If a Databricks user doesn\u2019t have Monte Carlo `Account Owner` permissions, the deletion only removes the Partner Connect integration from the Databricks workspace. The integration remains intact in the Monte Carlo account.\n* A Monte Carlo account can only connect to one Databricks workspace using Partner Connect. If you try to connect a second workspace to a Monte Carlo account using Partner Connect, an error prompts you to connect manually. \n### Steps to connect using Partner Connect \nTo connect to Monte Carlo using Partner Connect, see [Connect to data governance partners using Partner Connect](https://docs.databricks.com/partner-connect/data-governance.html).\n\n#### Connect Databricks to Monte Carlo\n##### Connect to Monte Carlo manually\n\nTo connect to Databricks from Monte Carlo manually, see [Databricks](https://docs.getmontecarlo.com/docs/overview-databricks) in the Monte Carlo documentation.\n\n#### Connect Databricks to Monte Carlo\n##### Additional resources\n\n* [Website](https://www.montecarlodata.com/)\n* [Documentation](https://docs.getmontecarlo.com/)\n\n", "chunk_id": "f82967c6c71721f8c7341c46473ed139", "url": "https://docs.databricks.com/partners/data-governance/monte-carlo.html"} +{"chunked_text": "# Query data\n## Data format options\n#### Read and write XML files\n\nPreview \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \nThis article describes how to read and write XML files. \nExtensible Markup Language (XML) is a markup language for formatting, storing, and sharing data in textual format. It defines a set of rules for serializing data ranging from documents to arbitrary data structures. \nNative XML file format support enables ingestion, querying, and parsing of XML data for batch processing or streaming. It can automatically infer and evolve schema and data types, supports SQL expressions like `from_xml`, and can generate XML documents. It doesn\u2019t require external jars and works seamlessly with Auto Loader, `read_files` and `COPY INTO`.\n\n#### Read and write XML files\n##### Requirements\n\nDatabricks Runtime 14.3 and above\n\n", "chunk_id": "5fe42958804c2da068e5d52f97df98f3", "url": "https://docs.databricks.com/query/formats/xml.html"} +{"chunked_text": "# Query data\n## Data format options\n#### Read and write XML files\n##### Parse XML records\n\nXML specification mandates a well-formed structure. However, this specification doesn\u2019t immediately map to a tabular format. You must specify the `rowTag` option to indicate the XML element that maps to a `DataFrame` `Row`. The `rowTag` element becomes the top-level `struct`. The child elements of `rowTag` become the fields of the top-level `struct`. \nYou can specify the schema for this record or let it be inferred automatically. Because the parser only examines the `rowTag` elements, DTD and external entities are filtered out. \nThe following examples illustrate schema inference and parsing of an XML file using different `rowTag` options: \n```\nxmlString = \"\"\"\n\n\nCorets, Eva\nMaeve Ascendant\n\n\nCorets, Eva\nOberon's Legacy\n\n\"\"\"\n\nxmlPath = \"dbfs:/tmp/books.xml\"\ndbutils.fs.put(xmlPath, xmlString, True)\n\n``` \n```\nval xmlString = \"\"\"\n\n\nCorets, Eva\nMaeve Ascendant\n\n\nCorets, Eva\nOberon's Legacy\n\n\"\"\"\nval xmlPath = \"dbfs:/tmp/books.xml\"\ndbutils.fs.put(xmlPath, xmlString)\n\n``` \nRead the XML file with `rowTag` option as \u201cbooks\u201d: \n```\ndf = spark.read.option(\"rowTag\", \"books\").format(\"xml\").load(xmlPath)\ndf.printSchema()\ndf.show(truncate=False)\n\n``` \n```\nval df = spark.read.option(\"rowTag\", \"books\").xml(xmlPath)\ndf.printSchema()\ndf.show(truncate=false)\n\n``` \nOutput: \n```\nroot\n|-- book: array (nullable = true)\n| |-- element: struct (containsNull = true)\n| | |-- _id: string (nullable = true)\n| | |-- author: string (nullable = true)\n| | |-- title: string (nullable = true)\n\n+------------------------------------------------------------------------------+\n|book |\n+------------------------------------------------------------------------------+\n|[{bk103, Corets, Eva, Maeve Ascendant}, {bk104, Corets, Eva, Oberon's Legacy}]|\n+------------------------------------------------------------------------------+\n\n``` \nRead the XML file with `rowTag` as \u201cbook\u201d: \n```\ndf = spark.read.option(\"rowTag\", \"book\").format(\"xml\").load(xmlPath)\n# Infers three top-level fields and parses `book` in separate rows:\n\n``` \n```\nval df = spark.read.option(\"rowTag\", \"book\").xml(xmlPath)\n// Infers three top-level fields and parses `book` in separate rows:\n\n``` \nOutput: \n```\nroot\n|-- _id: string (nullable = true)\n|-- author: string (nullable = true)\n|-- title: string (nullable = true)\n\n+-----+-----------+---------------+\n|_id |author |title |\n+-----+-----------+---------------+\n|bk103|Corets, Eva|Maeve Ascendant|\n|bk104|Corets, Eva|Oberon's Legacy|\n+-----+-----------+---------------+\n\n```\n\n", "chunk_id": "ab8006157be60a0bf35c41db32d4020e", "url": "https://docs.databricks.com/query/formats/xml.html"} +{"chunked_text": "# Query data\n## Data format options\n#### Read and write XML files\n##### Data source options\n\nData source options for XML can be specified the following ways: \n* The `.option/.options` methods of the following: \n+ DataFrameReader\n+ DataFrameWriter\n+ DataStreamReader\n+ DataStreamWriter\n* The following built-in functions: \n+ [from\\_xml](https://docs.databricks.com/sql/language-manual/functions/from_xml.html)\n+ [to\\_xml](https://docs.databricks.com/sql/language-manual/functions/to_xml.html)\n+ [schema\\_of\\_xml](https://docs.databricks.com/sql/language-manual/functions/schema_of_xml.html)\n* The `OPTIONS` clause of [CREATE TABLE USING DATA\\_SOURCE](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-using.html) \nFor a list of options, see [Auto Loader options](https://docs.databricks.com/ingestion/auto-loader/options.html).\n\n", "chunk_id": "f1bd7465f04186600614df9d5bfb2865", "url": "https://docs.databricks.com/query/formats/xml.html"} +{"chunked_text": "# Query data\n## Data format options\n#### Read and write XML files\n##### XSD support\n\nYou can optionally validate each row-level XML record by an XML Schema Definition (XSD). The XSD file is specified in the `rowValidationXSDPath` option. The XSD does not otherwise affect the schema provided or inferred. A record that fails the validation is marked as \u201ccorrupted\u201d and handled based on the corrupt record handling mode option described in the option section. \nYou can use `XSDToSchema` to extract a Spark DataFrame schema from a XSD file. It supports only simple, complex, and sequence types, and only supports basic XSD functionality. \n```\nimport org.apache.spark.sql.execution.datasources.xml.XSDToSchema\nimport org.apache.hadoop.fs.Path\n\nval xsdPath = \"dbfs:/tmp/books.xsd\"\nval xsdString = \"\"\"\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\"\"\"\n\ndbutils.fs.put(xsdPath, xsdString, true)\n\nval schema1 = XSDToSchema.read(xsdString)\nval schema2 = XSDToSchema.read(new Path(xsdPath))\n\n``` \nThe following table shows the conversion of XSD data types to Spark data types: \n| XSD Data Types | Spark Data Types |\n| --- | --- |\n| `boolean` | `BooleanType` |\n| `decimal` | `DecimalType` |\n| `unsignedLong` | `DecimalType(38, 0)` |\n| `double` | `DoubleType` |\n| `float` | `FloatType` |\n| `byte` | `ByteType` |\n| `short`, `unsignedByte` | `ShortType` |\n| `integer`, `negativeInteger`, `nonNegativeInteger`, `nonPositiveInteger`, `positiveInteger`, `unsignedShort` | `IntegerType` |\n| `long`, `unsignedInt` | `LongType` |\n| `date` | `DateType` |\n| `dateTime` | `TimestampType` |\n| `Others` | `StringType` |\n\n", "chunk_id": "d706e0e2a206e588b8393cf5c8d7fd46", "url": "https://docs.databricks.com/query/formats/xml.html"} +{"chunked_text": "# Query data\n## Data format options\n#### Read and write XML files\n##### Parse nested XML\n\nXML data in a string-valued column in an existing DataFrame can be parsed with `schema_of_xml` and `from_xml` that returns the schema and the parsed results as new `struct` columns. XML data passed as an argument to `schema_of_xml` and `from_xml` must be a single well-formed XML record. \n### schema\\_of\\_xml \n**Syntax** \n```\nschema_of_xml(xmlStr [, options] )\n\n``` \n**Arguments** \n* `xmlStr`: A STRING expression specifying a single well-formed XML record.\n* `options`: An optional `MAP` literal specifying directives. \n**Returns** \nA STRING holding a definition of a struct with n fields of strings where the column names are derived from the XML element and attribute names. The field values hold the derived formatted SQL types. \n### from\\_xml \n**Syntax** \n```\nfrom_xml(xmlStr, schema [, options])\n\n``` \n**Arguments** \n* `xmlStr`: A STRING expression specifying a single well-formed XML record.\n* `schema`: A STRING expression or invocation of the `schema_of_xml` function.\n* `options`: An optional `MAP` literal specifying directives. \n**Returns** \nA struct with field names and types matching the schema definition. Schema must be defined as comma-separated column name and data type pairs as used in, for example, `CREATE TABLE`. Most options shown in the [data source options](https://docs.databricks.com/query/formats/xml.html#options) are applicable with the\nfollowing exceptions: \n* `rowTag`: Because there is only one XML record, the `rowTag` option is not applicable.\n* `mode` (default: `PERMISSIVE`): Allows a mode for dealing with corrupt records during parsing. \n+ `PERMISSIVE`: When it meets a corrupted record, puts the malformed string into a field configured by `columnNameOfCorruptRecord`, and sets malformed fields to `null`. To keep corrupt records, you can set a string type field named `columnNameOfCorruptRecord` in a user-defined schema. If a schema does not have the field, it drops corrupt records during parsing. When inferring a schema, it implicitly adds a `columnNameOfCorruptRecord` field in an output schema.\n+ `FAILFAST`: Throws an exception when it meets corrupted records.\n\n", "chunk_id": "7aaf9df65ced35cfb72a123de76b5d1e", "url": "https://docs.databricks.com/query/formats/xml.html"} +{"chunked_text": "# Query data\n## Data format options\n#### Read and write XML files\n##### Structure conversion\n\nDue to the structure differences between DataFrame and XML, there are some conversion rules from XML data to `DataFrame` and from `DataFrame` to XML data. Note that handling attributes can be disabled with the option `excludeAttribute`. \n### Conversion from XML to DataFrame \n**Attributes**: Attributes are converted as fields with the heading prefix `attributePrefix`. \n```\n\ntwo\nthree\n\n\n``` \nproduces a schema below: \n```\nroot\n|-- _myOneAttrib: string (nullable = true)\n|-- two: string (nullable = true)\n|-- three: string (nullable = true)\n\n``` \n**Character data in an element containing attribute(s) or child element(s):** These are parsed into the `valueTag` field. If there are multiple occurrences of character data, the `valueTag` field is converted to an `array` type. \n```\n\ntwo\nsome value between elements\nthree\nsome other value between elements\n\n\n``` \nproduces a schema below: \n```\nroot\n|-- _VALUE: array (nullable = true)\n| |-- element: string (containsNull = true)\n|-- two: struct (nullable = true)\n| |-- _VALUE: string (nullable = true)\n| |-- _myTwoAttrib: string (nullable = true)\n|-- three: string (nullable = true)\n\n``` \n### Conversion from DataFrame to XML \n**Element as an array in an array**: Writing a XML file from `DataFrame` having a field\n`ArrayType` with its element as `ArrayType` would have an additional nested field for the\nelement. This would not happen in reading and writing XML data but writing a `DataFrame`\nread from other sources. Therefore, roundtrip in reading and writing XML files has the same\nstructure but writing a `DataFrame` read from other sources is possible to have a different\nstructure. \nDataFrame with a schema below: \n```\n|-- a: array (nullable = true)\n| |-- element: array (containsNull = true)\n| | |-- element: string (containsNull = true)\n\n``` \nand with data below: \n```\n+------------------------------------+\n| a|\n+------------------------------------+\n|[WrappedArray(aa), WrappedArray(bb)]|\n+------------------------------------+\n\n``` \nproduces a XML file below: \n```\n\naa\n\n\nbb\n\n\n``` \nThe element name of the unnamed array in the `DataFrame` is specified by the option `arrayElementName` (Default: `item`).\n\n", "chunk_id": "fb8ac3d349e78d5543eb6591a4089691", "url": "https://docs.databricks.com/query/formats/xml.html"} +{"chunked_text": "# Query data\n## Data format options\n#### Read and write XML files\n##### Rescued data column\n\nThe rescued data column ensures that you never lose or miss out on data during ETL. You can enable the rescued data column to capture any data that wasn\u2019t parsed because one or more fields in a record have one of the following issues: \n* Absent from the provided schema\n* Does not match the data type of the provided schema\n* Has a case mismatch with the field names in the provided schema \nThe rescued data column is returned as a JSON document containing the columns that were rescued, and the source file path of the record. To remove the source file path from the rescued data column, you can set the following SQL configuration: \n```\nspark.conf.set(\"spark.databricks.sql.rescuedDataColumn.filePath.enabled\", \"false\")\n\n``` \n```\nspark.conf.set(\"spark.databricks.sql.rescuedDataColumn.filePath.enabled\", \"false\").\n\n``` \nYou can enable the rescued data column by setting the option `rescuedDataColumn` to a column name when reading data, such as `_rescued_data` with `spark.read.option(\"rescuedDataColumn\", \"_rescued_data\").format(\"xml\").load()`. \nThe XML parser supports three modes when parsing records: `PERMISSIVE`, `DROPMALFORMED`, and `FAILFAST`. When used together with `rescuedDataColumn`, data type mismatches do not cause records to be dropped in `DROPMALFORMED` mode or throw an error in `FAILFAST` mode. Only corrupt records (incomplete or malformed XML) are dropped or throw errors.\n\n", "chunk_id": "89b6b56fa7bacb1081d64af5c7e2ccd4", "url": "https://docs.databricks.com/query/formats/xml.html"} +{"chunked_text": "# Query data\n## Data format options\n#### Read and write XML files\n##### Schema inference and evolution in Auto Loader\n\nFor a detailed discussion of this topic and applicable options, see [Configure schema inference and evolution in Auto Loader](https://docs.databricks.com/ingestion/auto-loader/schema.html). You can configure Auto Loader to automatically detect the schema of loaded XML data, allowing you to initialize tables without explicitly declaring the data schema and evolve the table schema as new columns are introduced. This eliminates the need to manually track and apply schema changes over time. \nBy default, Auto Loader schema inference seeks to avoid schema evolution issues due to type mismatches. For formats that don\u2019t encode data types (JSON, CSV, and XML), Auto Loader infers all columns as strings, including nested fields in XML files. The Apache Spark `DataFrameReader` uses a different behavior for schema inference, selecting data types for columns in XML sources based on sample data. To enable this behavior with Auto Loader, set the option `cloudFiles.inferColumnTypes` to `true`. \nAuto Loader detects the addition of new columns as it processes your data. When Auto Loader detects a new column, the stream stops with an `UnknownFieldException`. Before your stream throws this error, Auto Loader performs schema inference on the latest micro-batch of data and updates the schema location with the latest schema by merging new columns to the end of the schema. The data types of existing columns remain unchanged. Auto Loader supports different [modes for schema evolution](https://docs.databricks.com/ingestion/auto-loader/schema.html#evolution), which you set in the option `cloudFiles.schemaEvolutionMode`. \nYou can use [schema hints](https://docs.databricks.com/ingestion/auto-loader/schema.html#schema-hints) to enforce the schema information that you know and expect on an inferred schema. When you know that a column is of a specific data type, or if you want to choose a more general data type (for example, a double instead of an integer), you can provide an arbitrary number of hints for column data types as a string using SQL schema specification syntax. When the rescued data column is enabled, fields named in a case other than that of the schema are loaded to the `_rescued_data` column. You can change this behavior by setting the option `readerCaseSensitive` to `false`, in which case Auto Loader reads data in a case-insensitive way.\n\n", "chunk_id": "b126391a570a70d7f8736baa9c2e0649", "url": "https://docs.databricks.com/query/formats/xml.html"}

` must be between `` and ``, inclusive \n### ST\\_INVALID\\_SRID\\_VALUE \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nInvalid or unsupported SRID `` \n### ST\\_NOT\\_ENABLED \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \n`` is disabled or unsupported. Consider enabling Photon or switch to a tier that supports ST expressions \n### ST\\_UNSUPPORTED\\_RETURN\\_TYPE \n[SQLSTATE: 0A000](https://docs.databricks.com/error-messages/sqlstates.html#class-0a-feature-not-supported) \nThe GEOGRAPHY and GEOMETRY data types cannot be returned in queries. Use one of the following SQL expressions to convert them to standard interchange formats: ``. \n### [WKB\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/wkb-parse-error-error-class.html) \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nError parsing WKB: `` at position `` \nFor more details see [WKB\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/wkb-parse-error-error-class.html) \n### [WKT\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/wkt-parse-error-error-class.html) \n[SQLSTATE: 22023](https://docs.databricks.com/error-messages/sqlstates.html#class-22-data-exception) \nError parsing WKT: `` at position `` \nFor more details see [WKT\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/wkt-parse-error-error-class.html)\n\n", "chunk_id": "6f183fce7427368ff4003e2cf0d1150d", "url": "https://docs.databricks.com/error-messages/error-classes.html"} +{"chunked_text": "# \n### Compute\n\nDatabricks compute refers to the selection of computing resources available in the Databricks workspace. Users need access to compute to run data engineering, data science, and data analytics workloads, such as production ETL pipelines, streaming analytics, ad-hoc analytics, and machine learning. \nUsers can either connect to existing compute or create new compute if they have the proper permissions. \nYou can view the compute you have access to using the **Compute** section of the workspace: \n![All-purpose compute page in Databricks workspace](https://docs.databricks.com/_images/compute-page.png)\n\n", "chunk_id": "0d4f550cbc5af7c5014371acf5f115a2", "url": "https://docs.databricks.com/compute/index.html"} +{"chunked_text": "# \n### Compute\n#### Types of compute\n\nThese are the types of compute available in Databricks: \n* **Serverless compute for notebooks (Public Preview)**: On-demand, scalable compute used to execute SQL and Python code in notebooks.\n* **Serverless compute for workflows (Public Preview)**: On-demand, scalable compute used to run your Databricks jobs without configuring and deploying infrastructure. \n* **All-Purpose compute**: Provisioned compute used to analyze data in notebooks. You can create, terminate, and restart this compute using the UI, CLI, or REST API.\n* **Job compute**: Provisioned compute used to run automated jobs. The Databricks job scheduler automatically creates a job compute whenever a job is configured to run on new compute. The compute terminates when the job is complete. You *cannot* restart a job compute. See [Use Databricks compute with your jobs](https://docs.databricks.com/workflows/jobs/use-compute.html).\n* **Instance pools**: Compute with idle, ready-to-use instances, used to reduce start and autoscaling times. You can create this compute using the UI, CLI, or REST API. \n* **Serverless SQL warehouses**: On-demand elastic compute used to run SQL commands on data objects in the SQL editor or interactive notebooks. You can create SQL warehouses using the UI, CLI, or REST API. \n* **Classic SQL warehouses**: Used to run SQL commands on data objects in the SQL editor or interactive notebooks. You can create SQL warehouses using the UI, CLI, or REST API. \nThe articles in this section describe how to work with compute resources using the Databricks UI. For other methods, see [What is the Databricks CLI?](https://docs.databricks.com/dev-tools/cli/index.html) and the [Databricks REST API reference](https://docs.databricks.com/api/workspace).\n\n", "chunk_id": "e859978d9a9bd30b9a5f0c749f1821a4", "url": "https://docs.databricks.com/compute/index.html"} +{"chunked_text": "# \n### Compute\n#### Databricks Runtime\n\nDatabricks Runtime is the set of core components that run on your compute. The Databricks Runtime is a configurable setting in all-purpose of jobs compute but autoselected in SQL warehouses. \nEach Databricks Runtime version includes updates that improve the usability, performance, and security of big data analytics. The Databricks Runtime on your compute adds many features, including: \n* Delta Lake, a next-generation storage layer built on top of Apache Spark that provides ACID transactions, optimized layouts and indexes, and execution engine improvements for building data pipelines. See [What is Delta Lake?](https://docs.databricks.com/delta/index.html).\n* Installed Java, Scala, Python, and R libraries.\n* Ubuntu and its accompanying system libraries.\n* GPU libraries for GPU-enabled clusters.\n* Databricks services that integrate with other components of the platform, such as notebooks, jobs, and cluster management. \nFor information about the contents of each runtime version, see the [release notes](https://docs.databricks.com/release-notes/runtime/index.html). \n### Runtime versioning \nDatabricks Runtime versions are released on a regular basis: \n* **Long Term Support** versions are represented by an **LTS** qualifier (for example, **3.5 LTS**). For each major release, we declare a \u201ccanonical\u201d feature version, for which we provide three full years of support. See [Databricks runtime support lifecycles](https://docs.databricks.com/release-notes/runtime/databricks-runtime-ver.html) for more information.\n* **Major** versions are represented by an increment to the version number that precedes the decimal point (the jump from 3.5 to 4.0, for example). They are released when there are major changes, some of which may not be backwards-compatible.\n* **Feature** versions are represented by an increment to the version number that follows the decimal point (the jump from 3.4 to 3.5, for example). Each major release includes multiple feature releases. Feature releases are always backward compatible with previous releases within their major release.\n\n", "chunk_id": "0c11047c59c93fa84ecc01a09f217488", "url": "https://docs.databricks.com/compute/index.html"} +{"chunked_text": "# \n### Compute\n#### What is Serverless Compute?\n\nServerless compute enhances productivity, cost efficiency, and reliability in the following ways: \n* **Productivity**: Cloud resources are managed by Databricks, reducing management overhead and providing instant compute to enhance user productivity.\n* **Efficiency**: Serverless compute offers rapid start-up and scaling times, minimizing idle time and ensuring you only pay for the compute you use.\n* **Reliability**: With serverless compute, capacity handling, security, patching, and upgrades are managed automatically, alleviating concerns about security policies and capacity shortages.\n\n### Compute\n#### What are Serverless SQL Warehouses?\n\nDatabricks SQL delivers optimal price and performance with serverless SQL warehouses. Key advantages of serverless warehouses over pro and classic models include: \n* **Instant and elastic compute**: Eliminates waiting for infrastructure resources and avoids resource over-provisioning during usage spikes. Intelligent workload management dynamically handles scaling. See [SQL warehouse types](https://docs.databricks.com/admin/sql/warehouse-types.html) for more information on intelligent workload management and other serverless features.\n* **Minimal management overhead**: Capacity management, patching, upgrades, and performance optimization are all handled by Databricks, simplifying operations and leading to predictable pricing.\n* **Lower total cost of ownership (TCO)**: Automatic provisioning and scaling of resources as needed helps avoid over-provisioning and reduces idle times, thus lowering TCO.\n\n", "chunk_id": "9ac26067221608897c3754607612491f", "url": "https://docs.databricks.com/compute/index.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n\nThis article describes how to upgrade tables and views registered in your existing workspace-local Hive metastore to Unity Catalog. You can upgrade a Hive table either to a *managed table* or *external table* in Unity Catalog. \n* **Managed tables** are the preferred way to create tables in Unity Catalog. Unity Catalog fully manages their lifecycle, file layout, and storage. Unity Catalog also optimizes their performance automatically. Managed tables always use the [Delta](https://docs.databricks.com/delta/index.html) table format. \nManaged tables reside in a [managed storage location](https://docs.databricks.com/data-governance/unity-catalog/index.html#managed-storage) that you reserve for Unity Catalog. Because of this storage requirement, you must use [CLONE](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#clone) or [CREATE TABLE AS SELECT](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#create-table-as-select) (CTAS) if you want to copy existing Hive tables to Unity Catalog as managed tables.\n* **External tables** are tables whose data lifecycle, file layout, and storage location are not managed by Unity Catalog. Multiple data formats are supported for external tables. \nTypically you use external tables only when you also need direct access to data using non-Databricks compute (that is, not using Databricks clusters or Databricks SQL warehouses). External tables are also convenient in migration scenarios, because you can register existing data in Unity Catalog quickly without having to that copy data. This is thanks to the fact that data in external tables doesn\u2019t have to reside in reserved managed storage. \nFor more information about managed and external tables in Unity Catalog, see [Tables](https://docs.databricks.com/data-governance/unity-catalog/index.html#table).\n\n", "chunk_id": "40806a00f1559134fbf2431fd5f67015", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Hive to Unity Catalog migration options\n\nWhen you are ready to migrate Hive tables to Unity Catalog, you have several options, depending on your use case: \n| Migration tool | Description | Hive table requirements | Unity Catalog table created | Why should I use it? |\n| --- | --- | --- | --- | --- |\n| [UCX](https://docs.databricks.com/data-governance/unity-catalog/ucx.html) | A comprehensive set of command-line utilities and other tools that assess your workspace\u2019s readiness for Unity Catalog migration and perform workflows that migrate identities, permissions, storage locations, and tables to Unity Catalog. UCX is available on GitHub at [databrickslabs/ucx](https://github.com/databrickslabs/ucx). | Managed or external Hive tables | Managed or external | You want a comprehensive workspace upgrade planning tool that goes beyond upgrading Hive tables to Unity Catalog. You want to upgrade workspaces that have large amounts of data in the Hive metastore. You are comfortable running scripts. If you want to perform a bulk upgrade of Hive tables to Unity Catalog managed tables, this is your only option. UCX, like all Databricks Labs projects, is a public GitHub repo and not supported directly by Databricks. |\n| [Unity Catalog upgrade wizard](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#wizard-bulk) | A Catalog Explorer feature that enables you to bulk-copy entire schemas (databases) and multiple managed and external tables from your Hive metastore to the Unity Catalog metastore as external tables. The upgrade wizard performs the `SYNC` command on the tables that you select, leaving the original Hive tables intact. You have the option to schedule regular upgrades in order to pick up changes to the source Hive tables. | Managed or external Hive tables | External only | You want to quickly upgrade your Hive tables to external tables in Unity Catalog, and you prefer a visual interface. The ability to schedule regular syncs when the source Hive table changes makes it a useful tool for managing a \u201chybrid\u201d Hive and Unity Catalog workspace during the transition to Unity Catalog. |\n| [SYNC SQL command](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#sync) | `SYNC` enables you to copy external tables and managed tables (if the managed tables are stored outside of Databricks workspace storage, sometimes known as DBFS root) in your Hive metastore to external tables in Unity Catalog. You can sync individual tables or entire schemas. `SYNC` is designed to be run on a schedule to pick up new changes in the Hive metastore and sync them to Unity Catalog. | Managed or external Hive tables | External only | You want to quickly upgrade your Hive tables to external tables in Unity Catalog, and you prefer to use SQL commands rather than a visual interface. Scheduling regular `SYNC` runs to update existing Unity Catalog tables when the source Hive table changes makes it a useful tool for managing a \u201chybrid\u201d Hive and Unity Catalog workspace during the transition to Unity Catalog. Because you cannot use `SYNC` to upgrade managed tables that are in Databricks workspace storage, use [CREATE TABLE CLONE](https://docs.databricks.com/sql/language-manual/delta-clone.html) for those tables. |\n| [CREATE TABLE CLONE SQL command](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#clone) | `CREATE TABLE CLONE` enables you to upgrade managed tables in your Hive metastore to managed tables in Unity Catalog. You can clone individual tables. Deep clones are preferred, because they copy source table data to the clone target in addition to the existing table metadata. | Managed Hive tables that are in Delta, Parquet, or Iceberg format. Cloning Parquet and Iceberg source tables has some specific requirements and limitations: see [Requirements and limitations for cloning Parquet and Iceberg tables](https://docs.databricks.com/delta/clone-parquet.html#limitations). | Managed only | You want to migrate Hive managed tables to Unity Catalog managed tables to take full advantage of Unity Catalog data governance, and your Hive tables meet the criteria listed in the \u201cHive table requirements\u201d cell. If your Hive tables do not meet the \u201cHive table requirements\u201d, you can use the [CREATE TABLE AS SELECT SQL command](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#create-table-as-select) to upgrade a Hive table to a Unity Catalog managed table. However `CLONE` is almost always preferred. Cloning has simpler syntax than `CREATE TABLE AS SELECT`: you don\u2019t need to specify partitioning, format, invariants, nullability, stream, `COPY INTO`, and other metadata, because these are cloned from the source table. | \nThis article describes how to perform all but the UCX-driven upgrade process. Databricks recommends UCX for most workspace upgrade scenarios. However, for simpler use cases, you might prefer one or more of the tools described here.\n\n", "chunk_id": "947e2a7aca91a73b81e08da36839ddec", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Before you begin\n\nThis section describes some of the impacts of migration that you should be prepared for, along with permissions and compute requirements. \n### Understand the impact \nYou should be aware that when you modify your workloads to use the new Unity Catalog tables, you might need to change some behaviors: \n* Unity Catalog manages partitions differently than Hive. Hive commands that directly manipulate partitions are not supported on tables managed by Unity Catalog.\n* Table history is not migrated when you run `CREATE TABLE CLONE`. Any tables in the Hive metastore that you clone to Unity Catalog are treated as new tables. You cannot perform Delta Lake time travel or other operations that rely on pre-migration history. \nFor more information, see [Work with Unity Catalog and the legacy Hive metastore](https://docs.databricks.com/data-governance/unity-catalog/hive-metastore.html). \n### Requirements \nTo perform migrations, you must have: \n* A workspace that that has a Unity Catalog metastore and at least one Unity Catalog catalog. See [Set up and manage Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/get-started.html).\n* Privileges on the Unity Catalog catalogs to which you are migrating tables. These privilege requirements are enumerated at the start of each procedure covered in this article.\n* For migration to Unity Catalog external tables: storage credentials and external locations defined in Unity Catalog, and the `CREATE EXTERNAL TABLE` privilege on the external location.\n* Access to Databricks compute that meets both of the following requirements: \n+ Supports Unity Catalog (SQL warehouses or compute resources that use single-user or shared access mode).\n+ Allows access to the tables in the Hive metastore.Because compute resources that use shared access mode are enabled for [legacy table access control](https://docs.databricks.com/data-governance/table-acls/index.html) by default, that means that if you use that access mode, you must have table access control privileges on the Hive metastore that you are migrating from. You can grant yourself access using the following SQL command: \n```\nGRANT all_privileges ON catalog hive_metastore TO ``\n\n``` \nAlternatively, you can use a compute resource in single-user access mode. \nFor more information about managing privileges on objects in the Hive metastore, see [Hive metastore privileges and securable objects (legacy)](https://docs.databricks.com/data-governance/table-acls/object-privileges.html). For more information about managing privileges on objects in the Unity Catalog metastore, see [Manage privileges in Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html).\n\n", "chunk_id": "1c2d6aafef2b886ed7b8dc2679461d26", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Identify tables that are managed by the Hive metastore\n\nTo determine whether a table is currently registered in Unity Catalog, check the catalog name. Tables in the catalog `hive_metastore` are registered in the workspace-local Hive metastore. Any other catalogs listed are governed by Unity Catalog. \nTo view the tables in the `hive_metastore` catalog using Catalog Explorer: \n1. Click ![Catalog icon](https://docs.databricks.com/_images/data-icon.png) **Catalog** in the sidebar.\n2. In the catalog pane, browse to the `hive_metastore` catalog and expand the schema nodes. \nYou can also search for a specific table using the filter field in the Catalog pane.\n\n", "chunk_id": "c370d417c45749f5341bbe498ef724b3", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Upgrade a schema or multiple tables from the Hive metastore to Unity Catalog external tables using the upgrade wizard\n\nYou can copy complete schemas (databases) and multiple external or managed tables from your Databricks default Hive metastore to the Unity Catalog metastore using the **Catalog Explorer** upgrade wizard. The upgraded tables will be external tables in Unity Catalog. \nFor help deciding when to use the upgrade wizard, see [Hive to Unity Catalog migration options](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#comparison-table). \n### Requirements \n**Data format requirements**: \n* See [External tables](https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#external-table). \n**Compute requirements**: \n* A compute resource that supports Unity Catalog. See [Before you begin](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#before). \n**Unity Catalog object and permission requirements**: \n* A [storage credential](https://docs.databricks.com/connect/unity-catalog/storage-credentials.html) for an IAM role that authorizes Unity Catalog to access the tables\u2019 location path.\n* An [external location](https://docs.databricks.com/connect/unity-catalog/external-locations.html) that references the storage credential you just created and the path to the data on your cloud tenant.\n* `CREATE EXTERNAL TABLE` permission on the external locations of the tables to be upgraded. \n**Hive table access requirements**: \n* If your compute uses shared access mode, you need access to the tables in the Hive metastore, granted using legacy table access control. See [Before you begin](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#before). \n### Upgrade process \n1. Click ![Catalog icon](https://docs.databricks.com/_images/data-icon.png) **Catalog** in the sidebar to open the [Catalog Explorer](https://docs.databricks.com/catalog-explorer/index.html).\n2. Select `hive_metastore` as your catalog and select the schema (database) that you want to upgrade. \n![Select database](https://docs.databricks.com/_images/data-explorer-select-database.png)\n3. Click **Upgrade** at the top right of the schema detail view.\n4. Select all of the tables that you want to upgrade and click **Next**. \nOnly [external tables](https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-an-external-table) in [formats supported by Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/index.html#external-table) can be upgraded using the upgrade wizard.\n5. Set the destination catalog, schema (database), and owner for each table. \nUsers will be able to access the newly created table in the context of their privileges on the [catalog and schema](https://docs.databricks.com/data-governance/unity-catalog/index.html#object-model). \nTable owners have all privileges on the table, including `SELECT` and `MODIFY`. If you don\u2019t select an owner, the managed tables are created with you as the owner. Databricks generally recommends that you grant table ownership to groups. To learn more about object ownership in Unity Catalog, see [Manage Unity Catalog object ownership](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/ownership.html). \nTo assign the same catalog and schema to multiple tables, select the tables and click the **Set destination** button. \nTo assign the same owner to multiple tables, select the tables and click the **Set owner** button.\n6. Review the table configurations. To modify them, click the **Previous** button.\n7. Click **Create Query for Upgrade**. \nA query editor appears with generated SQL statements.\n8. Run the query. \nWhen the query is done, each table\u2019s metadata has been copied from Hive metastore to Unity Catalog. These tables are marked as upgraded in the upgrade wizard.\n9. Define fine-grained access control using the **Permissions** tab of each new table.\n10. (Optional) Add comments to each upgraded Hive table that points users to the new Unity Catalog table. \nReturn to the original table in the `hive.metastore` catalog to add the table comment. \nIf you use the following syntax in the table comment, notebooks and SQL query editor queries that reference the deprecated Hive table will display the deprecated table name using strikethrough text, display the comment as a warning, and provide a **Quick Fix** link to Databricks Assistant, which can update your code to reference the new table. \n```\nThis table is deprecated. Please use catalog.default.table instead of hive_metastore.schema.table.\n\n``` \nSee [Add comments to indicate that a Hive table has been migrated](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#deprecated-comment).\n11. Modify your workloads to use the new tables. \nIf you added a comment to the original Hive table like the one listed in the optional previous step, you can use the **Quick Fix** link and Databricks Assistant to help you find and modify workloads.\n\n", "chunk_id": "f2141406ab60254b178e2d458095d1bd", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Upgrade a single Hive table to a Unity Catalog external table using the upgrade wizard\n\nYou can copy a single table from your default Hive metastore to the Unity Catalog metastore using the upgrade wizard in **Catalog Explorer** \nFor help deciding when to use the upgrade wizard, see [Hive to Unity Catalog migration options](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#comparison-table). \n### Requirements \n**Data format requirements**: \n* See [External tables](https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#external-table). \n**Compute requirements**: \n* A compute resource that supports Unity Catalog. See [Before you begin](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#before). \n**Unity Catalog object and permission requirements**: \n* A [storage credential](https://docs.databricks.com/connect/unity-catalog/storage-credentials.html) for an IAM role that authorizes Unity Catalog to access the table\u2019s location path.\n* An [external location](https://docs.databricks.com/connect/unity-catalog/external-locations.html) that references the storage credential you just created and the path to the data on your cloud tenant.\n* `CREATE EXTERNAL TABLE` permission on the external locations of the tables to be upgraded. \n### Upgrade process \nTo upgrade an external table: \n1. Click ![Catalog icon](https://docs.databricks.com/_images/data-icon.png) **Catalog** in the sidebar to open [Catalog Explorer](https://docs.databricks.com/catalog-explorer/index.html).\n2. Select the database, then the table, that you want to upgrade.\n3. Click **Upgrade** in the top-right corner of the table detail view.\n4. Select the table to upgrade and click **Next**.\n5. Select your destination catalog, schema (database), and owner. \nUsers will be able to access the newly created table in the context of their privileges on the [catalog and schema](https://docs.databricks.com/data-governance/unity-catalog/index.html#object-model). \nTable owners have all privileges on the table, including `SELECT` and `MODIFY`. If you don\u2019t select an owner, the managed table is created with you as the owner. Databricks generally recommends that you grant table ownership to groups. To learn more about object ownership in Unity Catalog, see [Manage Unity Catalog object ownership](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/ownership.html).\n6. Click **Upgrade** in the top-right corner of the table detail view.\n7. Select the table to upgrade and click **Next**. \nThe table metadata is now copied to Unity Catalog, and a new table has been created. You can now use the **Permissions** tab to define fine-grained access control.\n8. Use the **Permissions** tab to define fine-grained access control.\n9. (Optional) Add a comment to the Hive table that points users to the new Unity Catalog table. \nReturn to the original table in the `hive.metastore` catalog to add the table comment. \nIf you use the following syntax in the table comment, notebooks and SQL query editor queries that reference the deprecated Hive table will display the deprecated table name using strikethrough text, display the comment as a warning, and provide a **Quick Fix** link to Databricks Assistant, which can update your code to reference the new table. \n```\nThis table is deprecated. Please use catalog.default.table instead of hive_metastore.schema.table.\n\n``` \nSee [Add comments to indicate that a Hive table has been migrated](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#deprecated-comment).\n10. Modify existing workloads to use the new table. \nIf you added a comment to the original Hive table like the one listed in the optional previous step, you can use the **Quick Fix** link and Databricks Assistant to help you find and modify workloads. \nNote \nIf you no longer need the old table, you can drop it from the Hive metastore. Dropping an external table does not modify the data files on your cloud tenant.\n\n", "chunk_id": "5c0ae0a3b198a5ea3ced930a03811068", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Upgrade a Hive table to a Unity Catalog external table using SYNC\n\nYou can use the `SYNC` SQL command to copy external tables in your Hive metastore to external tables in Unity Catalog. You can sync individual tables or entire schemas. \nYou can also use `SYNC` to copy Hive managed tables that are stored outside of Databricks workspace storage (sometimes called DBFS root) to external tables in Unity Catalog. You cannot use it to copy Hive managed tables stored in workspace storage. To copy those tables, use [CREATE TABLE CLONE](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#clone) instead. \nThe `SYNC` command performs a write operation to each source table it upgrades to add additional table properties for bookkeeping, including a record of the target Unity Catalog external table. \n`SYNC` can also be used to update existing Unity Catalog tables when the source tables in the Hive metastore are changed. This makes it a good tool for transitioning to Unity Catalog gradually. \nFor details, see [SYNC](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-sync.html). For help deciding when to use the upgrade wizard, see [Hive to Unity Catalog migration options](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#comparison-table). \n### Requirements \n**Data format requirements**: \n* See [External tables](https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#external-table). \n**Compute requirements**: \n* A compute resource that supports Unity Catalog. See [Before you begin](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#before). \n**Unity Catalog object and permission requirements**: \n* A [storage credential](https://docs.databricks.com/connect/unity-catalog/storage-credentials.html) for an IAM role that authorizes Unity Catalog to access the tables\u2019 location path.\n* An [external location](https://docs.databricks.com/connect/unity-catalog/external-locations.html) that references the storage credential you just created and the path to the data on your cloud tenant.\n* `CREATE EXTERNAL TABLE` permission on the external locations of the tables to be upgraded. \n**Hive table access requirements**: \n* If your compute uses shared access mode, you need access to the tables in the Hive metastore, granted using legacy table access control. See [Before you begin](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#before). \n### Upgrade process \nTo upgrade tables in your Hive metastore to Unity Catalog external tables using `SYNC`: \n1. In a notebook or the SQL query editor, run one of the following: \nSync an external Hive table: \n```\nSYNC TABLE .. FROM hive_metastore..\nSET OWNER ;\n\n``` \nSync an external Hive schema and all of its tables: \n```\nSYNC SCHEMA . FROM hive_metastore.\nSET OWNER ;\n\n``` \nSync a managed Hive table that is stored outside of Databricks workspace storage: \n```\nSYNC TABLE .. AS EXTERNAL FROM hive_metastore..\nSET OWNER ;\n\n``` \nSync a schema that contains managed Hive tables that are stored outside of Databricks workspace storage: \n```\nSYNC SCHEMA . AS EXTERNAL FROM hive_metastore.\nSET OWNER ;\n\n```\n2. Grant account-level users or groups access to the new table. See [Manage privileges in Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html).\n3. (Optional) Add a comment to the original Hive table that points users to the new Unity Catalog table. \nReturn to the original table in the `hive.metastore` catalog to add the table comment. To learn how to add table comments using Catalog Explorer, see [Add markdown comments to data objects using Catalog Explorer](https://docs.databricks.com/catalog-explorer/markdown-data-comments.html#manual). To learn how to add table comments using SQL statements in a notebook or the SQL query editor, see [COMMENT ON](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-comment.html). \nIf you use the following syntax in the table comment, notebooks and SQL query editor queries that reference the deprecated Hive table will display the deprecated table name using strikethrough text, display the comment as a warning, and provide a **Quick Fix** link to Databricks Assistant, which can update your code to reference the new table. \n```\nThis table is deprecated. Please use catalog.default.table instead of hive_metastore.schema.table.\n\n``` \nSee [Add comments to indicate that a Hive table has been migrated](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#deprecated-comment).\n4. After the table is migrated, users should update their existing queries and workloads to use the new table. \nIf you added a comment to the original Hive table like the one listed in the optional previous step, you can use the **Quick Fix** link and Databricks Assistant to help you find and modify workloads.\n5. Before you drop the old table, test for dependencies by revoking access to it and re-running related queries and workloads. \nDon\u2019t drop the old table if you are still relying on deprecation comments to help you find and update existing code that references the old table. Likewise, don\u2019t drop the old table if that table has changed since your original sync: `SYNC` can be used to update existing Unity Catalog tables with changes from source Hive tables.\n\n", "chunk_id": "7d170e236ddc5a81dc39820383e77f3a", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Upgrade a Hive managed table to a Unity Catalog managed table using CLONE\n\nUse `CREATE TABLE CLONE` to upgrade managed tables in your Hive metastore to managed tables in Unity Catalog. You can clone individual tables. *Deep clones* copy source table data to the clone target in addition to the existing table metadata. Use deep clone if you intend to drop the Hive source table. *Shallow clones* do not copy the data files to the clone target but give access to them by reference to the source data: the table metadata is equivalent to the source. Shallow clones are cheaper to create but require that users who query data in the clone target also have access to the source data. \nFor help deciding when to use `CLONE`, see [Hive to Unity Catalog migration options](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#comparison-table). For help deciding which clone type to use, see [Clone a table on Databricks](https://docs.databricks.com/delta/clone.html). \n### Requirements \n**Data format requirements**: \n* Managed Hive tables in Delta, Parquet, or Iceberg format. Cloning Parquet and Iceberg source tables has some specific requirements and limitations. See [Requirements and limitations for cloning Parquet and Iceberg tables](https://docs.databricks.com/delta/clone-parquet.html#limitations). \n**Compute requirements**: \n* A compute resource that supports Unity Catalog. See [Before you begin](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#before). \n**Permission requirements**: \n* The `USE CATALOG` and `USE SCHEMA` privileges on the catalog and schema that you add the table to, along with `CREATE TABLE` on the schema, or you must be the owner of the catalog or schema. See [Unity Catalog privileges and securable objects](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/privileges.html).\n* If your compute uses shared access mode, you need access to the tables in the Hive metastore, granted using legacy table access control. See [Before you begin](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#before). \n### Upgrade process \nTo upgrade managed tables in your Hive metastore to managed tables in Unity Catalog: \n1. In a notebook or the SQL query editor, run one of the following: \nDeep clone a managed table in the Hive metastore: \n```\nCREATE OR REPLACE TABLE ..\nDEEP CLONE hive_metastore..;\n\n``` \nShallow clone a managed table in the Hive metastore: \n```\nCREATE OR REPLACE TABLE ..\nSHALLOW CLONE hive_metastore..;\n\n``` \nFor information about additional parameters, including table properties, see [CREATE TABLE CLONE](https://docs.databricks.com/sql/language-manual/delta-clone.html).\n2. Grant account-level users or groups access to the new table. See [Manage privileges in Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html).\n3. (Optional) Add a comment to the original Hive table that points users to the new Unity Catalog table. \nReturn to the original table in the `hive.metastore` catalog to add the table comment. To learn how to add table comments using Catalog Explorer, see [Add markdown comments to data objects using Catalog Explorer](https://docs.databricks.com/catalog-explorer/markdown-data-comments.html#manual). To learn how to add table comments using SQL statements in a notebook or the SQL query editor, see [COMMENT ON](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-comment.html). \nIf you use the following syntax in the table comment, notebooks and SQL query editor queries that reference the deprecated Hive table will display the deprecated table name using strikethrough text, display the comment as a warning, and provide a **Quick Fix** link to Databricks Assistant, which can update your code to reference the new table. \n```\nThis table is deprecated. Please use catalog.default.table instead of hive_metastore.schema.table.\n\n``` \nSee [Add comments to indicate that a Hive table has been migrated](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#deprecated-comment).\n4. After the table is migrated, users should update their existing queries and workloads to use the new table. \nIf you added a comment to the original Hive table like the one listed in the optional previous step, you can use the **Quick Fix** link and Databricks Assistant to help you find and modify workloads.\n5. Before you drop the old table, test for dependencies by revoking access to it and re-running related queries and workloads. \nDon\u2019t drop the old table if you are still relying on deprecation comments to help you find and update existing code that references the old table. Likewise, don\u2019t drop the old table if you performed a shallow clone. Shallow clones reference data from the source Hive table.\n\n", "chunk_id": "3f68fc68515c5353dd180ff0848b8eb7", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Upgrade a Hive table to a Unity Catalog managed table using CREATE TABLE AS SELECT\n\nIf you cannot use or prefer not to use `CREATE TABLE CLONE` to migrate a table in your Hive metastore to a managed table in Unity Catalog, you can create a new managed table in Unity Catalog by querying the Hive table using `CREATE TABLE AS SELECT`. For information about the differences between `CREATE TABLE CLONE` and `CREATE TABLE AS SELECT`, see [Hive to Unity Catalog migration options](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#comparison-table). \n### Requirements \n**Compute requirements**: \n* A compute resource that supports Unity Catalog. See [Before you begin](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#before). \n**Permission requirements**: \n* The `USE CATALOG` and `USE SCHEMA` privileges on the catalog and schema that you add the table to, along with `CREATE TABLE` on the schema, or you must be the owner of the catalog or schema. See [Unity Catalog privileges and securable objects](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/privileges.html).\n* If your compute uses shared access mode, you need access to the tables in the Hive metastore, granted using legacy table access control. See [Before you begin](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#before). \n### Upgrade process \nTo upgrade a table in your Hive metastore to a managed table in Unity Catalog using `CREATE TABLE AS SELECT`: \n1. Create a new Unity Catalog table by querying the existing table. Replace the placeholder values: \n* ``: The Unity Catalog catalog for the new table.\n* ``: The Unity Catalog schema for the new table.\n* ``: A name for the Unity Catalog table.\n* ``: The schema for the Hive table, such as `default`.\n* ``: The name of the Hive table. \n```\nCREATE TABLE ..\nAS SELECT * FROM hive_metastore..;\n\n``` \n```\ndf = spark.table(\"hive_metastore..\")\n\ndf.write.saveAsTable(\nname = \"..\"\n)\n\n``` \n```\n%r\nlibrary(SparkR)\n\ndf = tableToDF(\"hive_metastore..\")\n\nsaveAsTable(\ndf = df,\ntableName = \"..\"\n)\n\n``` \n```\nval df = spark.table(\"hive_metastore..\")\n\ndf.write.saveAsTable(\ntableName = \"..\"\n)\n\n``` \nIf you want to migrate only some columns or rows, modify the `SELECT` statement. \nNote \nThe commands presented here create a [managed table](https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-a-managed-table) in which data is copied into a dedicated *managed storage location*. If instead you want to create an [external table](https://docs.databricks.com/data-governance/unity-catalog/create-tables.html#create-an-external-table), where the table is registered in Unity Catalog without moving the data in cloud storage, see [Upgrade a single Hive table to a Unity Catalog external table using the upgrade wizard](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#migrate-external). See also [Specify a managed storage location in Unity Catalog](https://docs.databricks.com/connect/unity-catalog/managed-storage.html).\n2. Grant account-level users or groups access to the new table. See [Manage privileges in Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html).\n3. (Optional) Add a comment to the original Hive table that points users to the new Unity Catalog table. \nReturn to the original table in the `hive.metastore` catalog to add the table comment. To learn how to add table comments using Catalog Explorer, see [Add markdown comments to data objects using Catalog Explorer](https://docs.databricks.com/catalog-explorer/markdown-data-comments.html#manual). To learn how to add table comments using SQL statements in a notebook or the SQL query editor, see [COMMENT ON](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-comment.html). \nIf you use the following syntax in the table comment, notebooks and SQL query editor queries that reference the deprecated Hive table will display the deprecated table name using strikethrough text, display the comment as a warning, and provide a **Quick Fix** link to Databricks Assistant, which can update your code to reference the new table. \n```\nThis table is deprecated. Please use catalog.default.table instead of hive_metastore.schema.table.\n\n``` \nSee [Add comments to indicate that a Hive table has been migrated](https://docs.databricks.com/data-governance/unity-catalog/migrate.html#deprecated-comment).\n4. After the table is migrated, users should update their existing queries and workloads to use the new table. \nIf you added a comment to the original Hive table like the one listed in the optional previous step, you can use the **Quick Fix** link and Databricks Assistant to help you find and modify workloads.\n5. Before you drop the old table, test for dependencies by revoking access to it and re-running related queries and workloads. \nDon\u2019t drop the old table if you are still relying on deprecation comments to help you find and update existing code that references the old table.\n\n", "chunk_id": "3a49956c9fecdd682ae4fc7bc33b3d66", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Upgrade a view to Unity Catalog\n\nAfter you upgrade all of a view\u2019s referenced tables to the same Unity Catalog metastore, you can [create a new view](https://docs.databricks.com/data-governance/unity-catalog/create-views.html) that references the new tables.\n\n#### Upgrade Hive tables and views to Unity Catalog\n##### Add comments to indicate that a Hive table has been migrated\n\nWhen you add a comment to the deprecated Hive table that points users to the new Unity Catalog table, notebooks and SQL query editor queries that reference the deprecated Hive table will display the deprecated table name using strikethrough text, display the comment as a warning, and provide a **Quick Fix** link to Databricks Assistant, which can update your code to reference the new table. \n![Hive table deprecation warning](https://docs.databricks.com/_images/hive-migration-table-comment.png) \nYour comment must use the following format: \n```\nThis table is deprecated. Please use catalog.default.table instead of hive_metastore.schema.table.\n\n``` \nTo learn how to add table comments using Catalog Explorer, see [Add markdown comments to data objects using Catalog Explorer](https://docs.databricks.com/catalog-explorer/markdown-data-comments.html#manual). To learn how to add table comments using SQL statements in a notebook or the SQL query editor, see [COMMENT ON](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-comment.html).\n\n", "chunk_id": "ad4262378deb285bb6776fa9a2319679", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Upgrade Hive tables and views to Unity Catalog\n##### Use Databricks Assistant to update a deprecated table reference\n\nIf you see strikethrough text on a table name in a notebook cell or statement in the SQL query editor, hover over the table name to reveal a warning notice. If that warning notice describes the table as deprecated and displays the new table name, click **Quick Fix**, followed by **Fix Deprecation**. Databricks Assistant opens, offering to replace the the deprecated table name with the new Unity Catalog table name. Follow the prompts to complete the task. \n![Video showing Hive table update using Databricks Assistant](https://docs.databricks.com/_images/hive-to-uc-sql.gif) \nSee also [Use Databricks Assistant](https://docs.databricks.com/notebooks/use-databricks-assistant.html).\n\n", "chunk_id": "68226cb6dffb9d6ad48b23536abfede8", "url": "https://docs.databricks.com/data-governance/unity-catalog/migrate.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Create and manage catalogs\n\nThis article shows how to create and manage catalogs in Unity Catalog. A catalog contains [schemas (databases)](https://docs.databricks.com/data-governance/unity-catalog/create-schemas.html), and a schema contains tables, views, volumes, models, and functions. \nNote \nIn some workspaces that were enabled for Unity Catalog automatically, a *workspace catalog* was created for you by default. If this catalog exists, all users in your workspace (and only your workspace) have access to it by default. See [Step 1: Confirm that your workspace is enabled for Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/get-started.html#auto-enabled-check). \nNote \nTo learn how to create a *foreign catalog*, a Unity Catalog object that mirrors a database in an external data system, see [Create a foreign catalog](https://docs.databricks.com/query-federation/index.html#foreign-catalog). See also [Manage and work with foreign catalogs](https://docs.databricks.com/query-federation/foreign-catalogs.html).\n\n#### Create and manage catalogs\n##### Requirements\n\nTo create a catalog: \n* You must be a Databricks metastore admin or have the `CREATE CATALOG` privilege on the metastore.\n* You must have a Unity Catalog metastore [linked to the workspace](https://docs.databricks.com/data-governance/unity-catalog/create-metastore.html) where you perform the catalog creation.\n* The cluster that you use to run a notebook to create a catalog must use a Unity Catalog-compliant access mode. See [Access modes](https://docs.databricks.com/compute/configure.html#access-mode). \nSQL warehouses always support Unity Catalog.\n\n", "chunk_id": "ecc74f280d6dfde086bc5c1b8f033007", "url": "https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Create and manage catalogs\n##### Create a catalog\n\nTo create a catalog, you can use Catalog Explorer or a SQL command. \n1. Log in to a workspace that is linked to the metastore.\n2. Click ![Catalog icon](https://docs.databricks.com/_images/data-icon.png) **Catalog**.\n3. Click the **Create Catalog** button.\n4. Select the catalog type that you want to create: \n* **Standard** catalog: a securable object that organizes data assets that are managed by Unity Catalog. For all use cases except Lakehouse Federation.\n* **Foreign** catalog: a securable object in Unity Catalog that mirrors a database in an external data system using Lakehouse Federation. See [Overview of Lakehouse Federation setup](https://docs.databricks.com/query-federation/index.html#setup-overview).\n5. (Optional but strongly recommended) Specify a managed storage location. Requires the `CREATE MANAGED STORAGE` privilege on the target external location. See [Specify a managed storage location in Unity Catalog](https://docs.databricks.com/connect/unity-catalog/managed-storage.html). \nImportant \nIf your workspace does not have a metastore-level storage location, you must specify a managed storage location when you create a catalog.\n6. Click **Create**.\n7. (Optional) Specify the workspace that the catalog is bound to. \nBy default, the catalog is shared with all workspaces attached to the current metastore. If the catalog will contain data that should be restricted to specific workspaces, go to the **Workspaces** tab and add those workspaces. \nFor more information, see [(Optional) Assign a catalog to specific workspaces](https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html#catalog-binding).\n8. Assign permissions for your catalog. See [Unity Catalog privileges and securable objects](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/privileges.html). \n1. Run the following SQL command in a notebook or Databricks SQL editor. Items in brackets are optional. Replace the placeholder values: \n* ``: A name for the catalog.\n* ``: Optional but strongly recommended. Provide a storage location path if you want managed tables in this catalog to be stored in a location that is different than the default root storage configured for the metastore. \nImportant \nIf your workspace does not have a metastore-level storage location, you must specify a managed storage location when you create a catalog. \nThis path must be defined in an external location configuration, and you must have the `CREATE MANAGED STORAGE` privilege on the external location configuration. You can use the path that is defined in the external location configuration or a subpath (in other words, `'s3://depts/finance'` or `'s3://depts/finance/product'`). Requires Databricks Runtime 11.3 and above.\n* ``: Optional description or other comment.\nNote \nIf you are creating a foreign catalog (a securable object in Unity Catalog that mirrors a database in an external data system, used for Lakehouse Federation), the SQL command is `CREATE FOREIGN CATALOG` and the options are different. See [Create a foreign catalog](https://docs.databricks.com/query-federation/index.html#foreign-catalog). \n```\nCREATE CATALOG [ IF NOT EXISTS ] \n[ MANAGED LOCATION '' ]\n[ COMMENT ];\n\n``` \nFor example, to create a catalog named `example`: \n```\nCREATE CATALOG IF NOT EXISTS example;\n\n``` \nIf you want to limit catalog access to specific workspaces in your account, also known as workspace-catalog binding, see [Bind a catalog to one or more workspaces](https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html#bind). \nFor parameter descriptions, see [CREATE CATALOG](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-catalog.html).\n2. Assign privileges to the catalog. See [Unity Catalog privileges and securable objects](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/privileges.html). \nWhen you create a catalog, two schemas (databases) are automatically created: `default` and `information_schema`. \nYou can also create a catalog by using the [Databricks Terraform provider](https://docs.databricks.com/dev-tools/terraform/index.html) and [databricks\\_catalog](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/catalog). You can retrieve information about catalogs by using [databricks\\_catalogs](https://registry.terraform.io/providers/databricks/databricks/latest/docs/data-sources/catalogs).\n\n", "chunk_id": "775411aa728129c930257c05483cb575", "url": "https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Create and manage catalogs\n##### (Optional) Assign a catalog to specific workspaces\n\nIf you use workspaces to isolate user data access, you may want to limit catalog access to specific workspaces in your account, also known as workspace-catalog binding. The default is to share the catalog with all workspaces attached to the current metastore. \nThe exception to this default is the *workspace catalog* that is created by default in workspaces that are enabled for Unity Catalog automatically (see [Automatic enablement of Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/get-started.html#enablement)). By default, this workspace catalog is bound only to your workspace, unless you choose to give other workspaces access to it. For important information about assigning permissions if you unbind this catalog, see [Unbind a catalog from a workspace](https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html#unbind). \nYou can allow read and write access to the catalog from a workspace (the default), or you can specify read-only access. If you specify read-only, then all write operations are blocked from that workspace to that catalog. \nTypical use cases for binding a catalog to specific workspaces include: \n* Ensuring that users can only access production data from a production workspace environment.\n* Ensuring that users can only process sensitive data from a dedicated workspace.\n* Giving users read-only access to production data from a developer workspace to enable development and testing. \nNote \nYou can also bind external locations and storage credentials to specific workspaces, limiting the ability to access data in external locations to privileged users in those workspaces. See [(Optional) Assign an external location to specific workspaces](https://docs.databricks.com/connect/unity-catalog/external-locations.html#workspace-binding) and [(Optional) Assign a storage credential to specific workspaces](https://docs.databricks.com/connect/unity-catalog/storage-credentials.html#workspace-binding). \n### Workspace-catalog binding example \nTake the example of production and development isolation. If you specify that your production data catalogs can only be accessed from production workspaces, this supersedes any individual grants that are issued to users. \n![Catalog-workspace binding diagram](https://docs.databricks.com/_images/catalog-bindings.png) \nIn this diagram, `prod_catalog` is bound to two production workspaces. Suppose a user has been granted access to a table in `prod_catalog` called `my_table` (using `GRANT SELECT ON my_table TO `). If the user tries to access `my_table` in the Dev workspace, they receive an error message. The user can access `my_table` only from the Prod ETL and Prod Analytics workspaces. \nWorkspace-catalog bindings are respected in all areas of the platform. For example, if you query the information schema, you see only the catalogs accessible in the workspace where you issue the query. Data lineage and search UIs likewise show only the catalogs that are assigned to the workspace (whether using bindings or by default). \n### Bind a catalog to one or more workspaces \nTo assign a catalog to specific workspaces, you can use Catalog Explorer or the Unity Catalog REST API. \n**Permissions required**: Metastore admin or catalog owner. \nNote \nMetastore admins can see all catalogs in a metastore using Catalog Explorer\u2014and catalog owners can see all catalogs they own in a metastore\u2014regardless of whether the catalog is assigned to the current workspace. Catalogs that are not assigned to the workspace appear grayed out, and no child objects are visible or queryable. \n1. Log in to a workspace that is linked to the metastore.\n2. Click ![Catalog icon](https://docs.databricks.com/_images/data-icon.png) **Catalog**.\n3. In the **Catalog** pane, on the left, click the catalog name. \nThe main Catalog Explorer pane defaults to the **Catalogs** list. You can also select the catalog there.\n4. On the **Workspaces** tab, clear the **All workspaces have access** checkbox. \nIf your catalog is already bound to one or more workspaces, this checkbox is already cleared.\n5. Click **Assign to workspaces** and enter or find the workspaces you want to assign.\n6. (Optional) Limit workspace access to read-only. \nOn the **Manage access level** menu, select **Change access to read-only**. \nYou can reverse this selection at any time by editing the catalog and selecting **Change access to read & write**. \nTo revoke access, go to the **Workspaces** tab, select the workspace, and click **Revoke**. \nThere are two APIs and two steps required to assign a catalog to a workspace. In the following examples, replace `` with your workspace instance name. To learn how to get the workspace instance name and workspace ID, see [Get identifiers for workspace objects](https://docs.databricks.com/workspace/workspace-details.html). To learn about getting access tokens, see [Authentication for Databricks automation - overview](https://docs.databricks.com/dev-tools/auth/index.html). \n1. Use the `catalogs` API to set the catalog\u2019s `isolation mode` to `ISOLATED`: \n```\ncurl -L -X PATCH 'https:///api/2.1/unity-catalog/catalogs/ \\\n-H 'Authorization: Bearer \\\n-H 'Content-Type: application/json' \\\n--data-raw '{\n\"isolation_mode\": \"ISOLATED\"\n}'\n\n``` \nThe default `isolation mode` is `OPEN` to all workspaces attached to the metastore.\n2. Use the update `bindings` API to assign the workspaces to the catalog: \n```\ncurl -L -X PATCH 'https:///api/2.1/unity-catalog/bindings/catalog/ \\\n-H 'Authorization: Bearer \\\n-H 'Content-Type: application/json' \\\n--data-raw '{\n\"add\": [{\"workspace_id\": , \"binding_type\": }...],\n\"remove\": [{\"workspace_id\": , \"binding_type\": \"}...]\n}'\n\n``` \nUse the `\"add\"` and `\"remove\"` properties to add or remove workspace bindings. `` can be either `\u201cBINDING_TYPE_READ_WRITE\u201d` (default) or `\u201cBINDING_TYPE_READ_ONLY\u201d`. \nTo list all workspace assignments for a catalog, use the list `bindings` API: \n```\ncurl -L -X GET 'https:///api/2.1/unity-catalog/bindings/catalog/ \\\n-H 'Authorization: Bearer \\\n\n``` \n### Unbind a catalog from a workspace \nInstructions for revoking workspace access to a catalog using Catalog Explorer or the `bindings` API are included in [Bind a catalog to one or more workspaces](https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html#bind). \nImportant \nIf your workspace was enabled for Unity Catalog automatically and you have a *workspace catalog*, workspace admins own that catalog and have all permissions on that catalog **in the workspace only**. If you unbind that catalog or bind it to other catalogs, you must grant any required permissions manually to the members of the workspace admins group as individual users or using account-level groups, because the workspace admins group is a workspace-local group. For more information about account groups vs workspace-local groups, see [Difference between account groups and workspace-local groups](https://docs.databricks.com/admin/users-groups/groups.html#account-vs-workspace-group).\n\n", "chunk_id": "cfae3b06b3e1abf5b43489717eea63a0", "url": "https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Create and manage catalogs\n##### Add schemas to your catalog\n\nTo learn how to add schemas (databases) to your catalog. see [Create and manage schemas (databases)](https://docs.databricks.com/data-governance/unity-catalog/create-schemas.html).\n\n#### Create and manage catalogs\n##### View catalog details\n\nTo view information about a catalog, you can use Catalog Explorer or a SQL command. \n1. Log in to a workspace that is linked to the metastore.\n2. Click ![Catalog icon](https://docs.databricks.com/_images/data-icon.png) **Catalog**.\n3. In the **Catalog** pane, find the catalog and click its name. \nSome details are listed at the top of the page. Others can be viewed on the **Schemas**, **Details**, **Permissions**, and **Workspaces** tabs. \nRun the following SQL command in a notebook or Databricks SQL editor. Items in brackets are optional. Replace the placeholder ``. \nFor details, see [DESCRIBE CATALOG](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-describe-catalog.html). \n```\nDESCRIBE CATALOG ;\n\n``` \nUse `CATALOG EXTENDED` to get full details.\n\n", "chunk_id": "cb1df3a33853147c10a7a1e83d67b4ab", "url": "https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Create and manage catalogs\n##### Delete a catalog\n\nTo delete (or drop) a catalog, you can use Catalog Explorer or a SQL command. To drop a catalog you must be its owner. \nYou must delete all schemas in the catalog except `information_schema` before you can delete a catalog. This includes the auto-created `default` schema. \n1. Log in to a workspace that is linked to the metastore.\n2. Click ![Catalog icon](https://docs.databricks.com/_images/data-icon.png) **Catalog**.\n3. In the **Catalog** pane, on the left, click the catalog you want to delete.\n4. In the detail pane, click the three-dot menu to the left of the **Create database** button and select **Delete**.\n5. On the **Delete catalog** dialog, click **Delete**. \nRun the following SQL command in a notebook or Databricks SQL editor. Items in brackets are optional. Replace the placeholder ``. \nFor parameter descriptions, see [DROP CATALOG](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-drop-catalog.html). \nIf you use `DROP CATALOG` without the `CASCADE` option, you must delete all schemas in the catalog except `information_schema` before you can delete the catalog. This includes the auto-created `default` schema. \n```\nDROP CATALOG [ IF EXISTS ] [ RESTRICT | CASCADE ]\n\n``` \nFor example, to delete a catalog named `vaccine` and its schemas: \n```\nDROP CATALOG vaccine CASCADE\n\n```\n\n", "chunk_id": "1af08aca9ac751bf37014f17066cd77b", "url": "https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html"} +{"chunked_text": "# Data governance with Unity Catalog\n## What is Unity Catalog?\n#### Create and manage catalogs\n##### Manage the default catalog\n\nA default catalog is configured for each workspace that is enabled for Unity Catalog. The default catalog lets you perform data operations without specifying a catalog. If you omit the top-level catalog name when you perform data operations, the default catalog is assumed. \nA workspace admin can view or switch the default catalog using the Admin Settings UI. You can also set the default catalog for a cluster using a Spark config. \nCommands that do not specify the catalog (for example `GRANT CREATE TABLE ON SCHEMA myschema TO mygroup`) are evaluated for the catalog in the following order: \n1. Is the catalog set for the session using a `USE CATALOG` statement or a JDBC setting?\n2. Is the Spark configuration `spark.databricks.sql.initial.catalog.namespace` set on the cluster?\n3. Is there a workspace default catalog set for the cluster? \n### The default catalog configuration when Unity Catalog is enabled \nThe default catalog that was initially configured for your workspace depends on how your workspace was enabled for Unity Catalog: \n* For some workspaces that were enabled for Unity Catalog automatically, the *workspace catalog* was set as the default catalog. See [Automatic enablement of Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/get-started.html#enablement).\n* For all other workspaces, the `hive_metastore` catalog was set as the default catalog. \nIf you are transitioning from the Hive metastore to Unity Catalog within an existing workspace, it typically makes sense to use `hive_metastore` as the default catalog to avoid impacting existing code that references the hive metastore. \n### Change the default catalog \nA workspace admin can change the default catalog for the workspace. Anyone with permission to create or edit a cluster can set a different default catalog for the cluster. \nWarning \nChanging the default catalog can break existing data operations that depend on it. \nTo configure a different default catalog for a workspace: \n1. Log in to your workspace as a workspace admin.\n2. Click your username in the top bar of the workspace and select **Settings** from the dropdown.\n3. Click the **Advanced** tab.\n4. On the **Default catalog for the workspace** row, enter the catalog name and click **Save**. \nRestart your SQL warehouses and clusters for the change to take effect. All new and restarted SQL warehouses and clusters will use this catalog as the workspace default. \nYou can also override the default catalog for a specific cluster by setting the following Spark configuration on the cluster. This approach is not available for SQL warehouses: \n```\nspark.databricks.sql.initial.catalog.name\n\n``` \nFor instructions, see [Spark configuration](https://docs.databricks.com/compute/configure.html#spark-configuration). \n### View the current default catalog \nTo get the current default catalog for your workspace, you can use a SQL statement in a notebook or SQL Editor query. A workspace admin can get the default catalog using the Admin Settings UI. \n1. Log in to your workspace as a workspace admin.\n2. Click your username in the top bar of the workspace and select **Settings** from the dropdown.\n3. Click the **Advanced** tab.\n4. On the **Default catalog for the workspace** row, view the catalog name. \nRun the following command in a notebook or SQL Editor query that is running on a SQL warehouse or Unity Catalog-compliant cluster. The workspace default catalog is returned as long as no `USE CATALOG` statement or JDBC setting has been set on the session, and as long as no `spark.databricks.sql.initial.catalog.namespace` config is set for the cluster. \n```\nSELECT current_catalog();\n\n```\n\n", "chunk_id": "c7345f2107f87832e3e61585850e6d1c", "url": "https://docs.databricks.com/data-governance/unity-catalog/create-catalogs.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is the Databricks CLI?\n#### Databricks CLI commands\n###### `auth` command group\n\nNote \nThis information applies to Databricks CLI versions 0.205 and above, which are in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). To find your version of the Databricks CLI, run `databricks -v`. \nThe `auth` command group within the [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) enables you to: \n* List any available authentication configuration profiles.\n* Get information about an individual authentication configuration profile.\n* Use OAuth user-to-machine (U2M) authentication to authenticate the Databricks CLI with your Databricks accounts and workspaces.\n* Get information about any OAuth access tokens that the Databricks CLI might have cached.\n* Get details about the configuration that the Databricks CLI is using to authenticate. \nImportant \nBefore you use the Databricks CLI, be sure to [set up the Databricks CLI](https://docs.databricks.com/dev-tools/cli/install.html) and [set up authentication for the Databricks CLI](https://docs.databricks.com/dev-tools/cli/authentication.html). \nYou run `auth` commands by appending them to `databricks auth`. To display help for the `auth` command, run `databricks auth -h`.\n\n", "chunk_id": "0a79a9838c880defeaed65f7a5b2b820", "url": "https://docs.databricks.com/dev-tools/cli/auth-commands.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is the Databricks CLI?\n#### Databricks CLI commands\n###### `auth` command group\n####### List configuration profiles\n\nTo get a list of all of your available configuration profiles and to check whether they are valid, run the `auth profiles` command, as follows: \n```\ndatabricks auth profiles\n\n``` \nOutput (the ellipses represent omitted content, for brevity): \n```\nName Host Valid\nDEFAULT https:// YES\n https:// NO\n\n``` \nTo determine whether each profile is valid, the Databricks CLI runs a list workspaces command for each account-level profile and runs a get current user command for each workspace-level profile. If the command succeeds, a `YES` is displayed; otherwise, a `NO` displays. \nThe output of the `auth profiles` command does not display any access tokens. To display an access token, see [Get information about a configuration profile](https://docs.databricks.com/dev-tools/cli/auth-commands.html#get-config-profile). \nConfiguration profiles are stored in the file `~/.databrickscfg` on Linux or macOS, or `%USERPROFILE%\\.databrickscfg` on Windows by default. You can change the default path of this file by setting the environment variable `DATABRICKS_CONFIG_FILE`. To learn how to set environment variables, see your operating system\u2019s documentation. \nTo create configuration profiles, see the [configure command group](https://docs.databricks.com/dev-tools/cli/configure-commands.html).\n\n", "chunk_id": "77e8fe15f62b59b413e053e7bc4319f1", "url": "https://docs.databricks.com/dev-tools/cli/auth-commands.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is the Databricks CLI?\n#### Databricks CLI commands\n###### `auth` command group\n####### Get information about a configuration profile\n\nTo get information about an existing configuration profile, run the `auth env` command, where `` represents the name of the profile, and `` represents the Databricks account console URL or the Databricks workspace URL, as follows: \n```\ndatabricks auth env --profile \n\n# Or:\ndatabricks auth env --host \n\n``` \nTip \nYou can press `Tab` after `--profile` or `-p` to display a list of existing available configuration profiles to choose from, instead of entering the configuration profile name manually. \nFor example, here is the output for a profile that is configured with Databricks access token authentication: \n```\n{\n\"env\": {\n\"DATABRICKS_AUTH_TYPE\": \"pat\",\n\"DATABRICKS_CONFIG_PROFILE\": \"\",\n\"DATABRICKS_HOST\": \"\",\n\"DATABRICKS_TOKEN\": \"\"\n}\n}\n\n``` \nNote \nIf more than one profile matches the `--host` value, an error displays, stating that it cannot find a single matching profile. For example, you might have one profile that has only a host value and another profile that has the same host value but also a token value. In this case, the Databricks CLI does not choose a profile and stops. To help the Databricks CLI choose the desired profile, try specifying a different `--host` value. For `--host` values that are account console URLs, try specifying an `--account-id` value instead of a `--host` value. \nTo create a configuration profile, see the [configure command group](https://docs.databricks.com/dev-tools/cli/configure-commands.html).\n\n", "chunk_id": "e20111cdfaf04da91abbe5df66d4f74e", "url": "https://docs.databricks.com/dev-tools/cli/auth-commands.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is the Databricks CLI?\n#### Databricks CLI commands\n###### `auth` command group\n####### Authenticate with OAuth\n\nInstead of authenticating with Databricks by using [access tokens and configuration profiles](https://docs.databricks.com/dev-tools/cli/authentication.html#token-auth), you can use OAuth user-to-machine (U2M) authentication. OAuth provides tokens with faster expiration times than Databricks personal access tokens, and offers better server-side session invalidation and scoping. Because OAuth access tokens expire in less than an hour, this reduces the risk associated with accidentally checking tokens into source control. See [OAuth user-to-machine (U2M) authentication](https://docs.databricks.com/dev-tools/auth/oauth-u2m.html). \nTo configure and set up OAuth U2M authentication, see [OAuth user-to-machine (U2M) authentication](https://docs.databricks.com/dev-tools/cli/authentication.html#u2m-auth).\n\n###### `auth` command group\n####### Get OAuth access token details\n\nIf you want to see information about the cached OAuth access token that the Databricks CLI previously generated for a Databricks workspace, run the `auth token` command, where `` represents the Databricks workspace\u2019s URL, as follows: \n```\ndatabricks auth token \n\n``` \nOutput: \n```\n{\n\"access_token\": \"\",\n\"token_type\": \"Bearer\",\n\"expiry\": \"\"\n}\n\n```\n\n", "chunk_id": "e3dcd5d8c634e1dc38d54f45c0c70070", "url": "https://docs.databricks.com/dev-tools/cli/auth-commands.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is the Databricks CLI?\n#### Databricks CLI commands\n###### `auth` command group\n####### Get authentication details\n\nTo get details about the configuration that the Databricks CLI is using to authenticate, run the `auth describe` command. \nIf no options are specified, the `auth describe` command follows the [Default order of evaluation for client unified authentication methods and credentials](https://docs.databricks.com/dev-tools/auth/index.html#order-of-eval). \n```\ndatabricks auth describe\n\n``` \nOutput: \n```\nHost: https://\nUser: @\nAuthenticated with: \n-----\nCurrent configuration:\n\u2713 host: https:// (from //.databrickscfg config file)\n\u2713 profile: default\n\u2713 auth_type: (from //.databrickscfg config file)\n\n``` \nTo specify that information about a specific Databricks workspace is used, specify the `--host` option along with the workspace\u2019s URL. \n```\ndatabricks auth describe --host https://\n\n``` \nOutput: \n```\nHost: https://\nUser: @\nAuthenticated with: \n-----\nCurrent configuration:\n\u2713 host: https:// (from --host flag)\n\u2713 profile: default\n\u2713 auth_type: \n\n``` \nTo specify that information about a specific Databricks account is used, specify the `--host` option along with the Databricks account console URL, . \n```\ndatabricks auth describe --host \n\n``` \nOutput: \n```\nHost: \nUser: @\nAccountId: \nAuthenticated with: \n-----\nCurrent configuration:\n\u2713 host: (from --host flag)\n\u2713 account_id: \n\u2713 profile: default\n\u2713 auth_type: \n\n``` \nTo specify that information about a specific Databricks configuration profile is used, specify the `-p` or `--profile` option along with the profile\u2019s name. \n```\ndatabricks auth describe -p \n\n``` \nOutput: \n```\nHost: https://\nUser: @\nAuthenticated with: \n-----\nCurrent configuration:\n\u2713 host: https:// (from //.databrickscfg config file)\n\u2713 token: ******** (from //.databrickscfg config file)\n\u2713 profile: (from --profile flag)\n\u2713 auth_type: \n\n``` \nTo include sensitive information in the output (such as Databricks personal access tokens and client secrets), specify the `--sensitive` option. \n```\ndatabricks auth describe --sensitive\n\n``` \nOutput: \n```\nHost: https://\nUser: @\nAuthenticated with: pat\n-----\nCurrent configuration:\n\u2713 host: https:// (from //.databrickscfg config file)\n\u2713 token: (from //.databrickscfg config file)\n\u2713 profile: \n\u2713 auth_type: pat\n\n```\n\n", "chunk_id": "c26db9531cb41bed2232dd37864277e9", "url": "https://docs.databricks.com/dev-tools/cli/auth-commands.html"} +{"chunked_text": "# What is Databricks?\n### Databricks concepts\n\nThis article introduces the set of fundamental concepts you need to understand in order to use Databricks effectively.\n\n### Databricks concepts\n#### Accounts and workspaces\n\nIn Databricks, a *workspace* is a Databricks deployment in the cloud that functions as an environment for your team to access Databricks assets. Your organization can choose to have either multiple workspaces or just one, depending on its needs. \nA Databricks *account* represents a single entity that can include multiple workspaces. Accounts enabled for [Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/index.html) can be used to manage users and their access to data centrally across all of the workspaces in the account. Billing and support are also handled at the account level.\n\n### Databricks concepts\n#### Billing: Databricks units (DBUs)\n\nDatabricks bills based on Databricks units (DBUs), units of processing capability per hour based on VM instance type. \nSee the [Databricks on AWS pricing estimator](https://databricks.com/product/aws-pricing/instance-types).\n\n", "chunk_id": "7c49decebaff9a5b950cbbb729b33690", "url": "https://docs.databricks.com/getting-started/concepts.html"} +{"chunked_text": "# What is Databricks?\n### Databricks concepts\n#### Authentication and authorization\n\nThis section describes concepts that you need to know when you manage Databricks identities and their access to Databricks assets. \n### User \nA unique individual who has access to the system. User identities are represented by email addresses. See [Manage users](https://docs.databricks.com/admin/users-groups/users.html). \n### Service principal \nA service identity for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms. Service principals are represented by an application ID. See [Manage service principals](https://docs.databricks.com/admin/users-groups/service-principals.html). \n### Group \nA collection of identities. Groups simplify identity management, making it easier to assign access to workspaces, data, and other securable objects. All Databricks identities can be assigned as members of groups. See [Manage groups](https://docs.databricks.com/admin/users-groups/groups.html) \n### Access control list (ACL) \nA list of permissions attached to the workspace, cluster, job, table, or experiment. An ACL specifies which users or system processes are granted access to the objects, as well as what operations are allowed on the assets. Each entry in a typical ACL specifies a subject and an operation. See [Access control lists](https://docs.databricks.com/security/auth-authz/access-control/index.html) \n### Personal access token \nAn opaque string is used to authenticate to the REST API and by tools in the [Technology partners](https://docs.databricks.com/integrations/index.html) to connect to SQL warehouses. See [Databricks personal access token authentication](https://docs.databricks.com/dev-tools/auth/pat.html). \n### UI \nThe Databricks UI is a graphical interface for interacting with features, such as workspace folders and their contained objects, data objects, and computational resources.\n\n", "chunk_id": "c9a3931a5e0af08d508b0ac269452a7d", "url": "https://docs.databricks.com/getting-started/concepts.html"} +{"chunked_text": "# What is Databricks?\n### Databricks concepts\n#### Data science & engineering\n\n[Data science & engineering](https://docs.databricks.com/workspace-index.html) tools aid collaboration among data scientists, data engineers, and data analysts. This section describes the fundamental concepts. \n### Workspace \nA [workspace](https://docs.databricks.com/workspace/index.html) is an environment for accessing all of your Databricks assets. A workspace organizes objects (notebooks, libraries, dashboards, and experiments) into [folders](https://docs.databricks.com/workspace/workspace-objects.html#folders) and provides access to data objects and computational resources. \n### Notebook \nA web-based interface for creating data science and machine learning workflows that can contain runnable commands, visualizations, and narrative text. See [Introduction to Databricks notebooks](https://docs.databricks.com/notebooks/index.html). \n### Dashboard \nAn interface that provides organized access to visualizations. See [Dashboards in notebooks](https://docs.databricks.com/notebooks/dashboards.html). \n### Library \nA package of code available to the notebook or job running on your cluster. Databricks runtimes include many [libraries](https://docs.databricks.com/libraries/index.html) and you can add your own. \n### Git folder (formerly Repos) \nA folder whose contents are co-versioned together by syncing them to a remote Git repository. [Databricks Git folders](https://docs.databricks.com/repos/index.html) integrate with Git to provide source and version control for your projects. \n### Experiment \nA collection of [MLflow runs](https://docs.databricks.com/mlflow/tracking.html) for training a machine learning model. See [Organize training runs with MLflow experiments](https://docs.databricks.com/mlflow/experiments.html).\n\n", "chunk_id": "6ce4ac495db43fb7865b294c10bec5eb", "url": "https://docs.databricks.com/getting-started/concepts.html"} +{"chunked_text": "# What is Databricks?\n### Databricks concepts\n#### Databricks interfaces\n\nThis section describes the interfaces that Databricks supports, in addition to the UI, for accessing your assets: API and command-line (CLI). \n### REST API \nThe Databricks REST API provides endpoints for modifying or requesting information about Databricks account and workspace objects. See [account reference](https://docs.databricks.com/api/account/introduction) and [workspace reference](https://docs.databricks.com/api/workspace/introduction). \n### CLI \nThe Databricks CLI is hosted on [GitHub](https://github.com/databricks/cli). The CLI is built on top of the Databricks REST API.\n\n", "chunk_id": "9841ce8fc932ed8820a4472411c1ec00", "url": "https://docs.databricks.com/getting-started/concepts.html"} +{"chunked_text": "# What is Databricks?\n### Databricks concepts\n#### Data management\n\nThis section describes the objects that hold the data on which you perform analytics and feed into machine learning algorithms. \n### Unity Catalog \nUnity Catalog is a unified governance solution for data and AI assets on Databricks that provides centralized access control, auditing, lineage, and data discovery capabilities across Databricks workspaces. See [What is Unity Catalog?](https://docs.databricks.com/data-governance/unity-catalog/index.html). \n### DBFS root \nImportant \nStoring and accessing data using DBFS root or DBFS mounts is a deprecated pattern and not recommended by Databricks. Instead, Databricks recommends using Unity Catalog to manage access to all data. See [What is Unity Catalog?](https://docs.databricks.com/data-governance/unity-catalog/index.html). \nThe DBFS root is a storage location available to all users by default. See [What is DBFS?](https://docs.databricks.com/dbfs/index.html). \n### Database \nA collection of data objects, such as tables or views and functions, that is organized so that it can be easily accessed, managed, and updated. See [What is a database?](https://docs.databricks.com/lakehouse/data-objects.html#database) \n### Table \nA representation of structured data. You query tables with Apache Spark SQL and Apache Spark APIs. See [What is a table?](https://docs.databricks.com/lakehouse/data-objects.html#table) \n### Delta table \nBy default, all tables created in Databricks are Delta tables. Delta tables are based on the [Delta Lake open source project](https://delta.io/), a framework for high-performance ACID table storage over cloud object stores. A Delta table stores data as a directory of files on cloud object storage and registers table metadata to the metastore within a catalog and schema. \nFind out more about [technologies branded as Delta](https://docs.databricks.com/introduction/delta-comparison.html). \n### Metastore \nThe component that stores all the structure information of the various tables and partitions in the data warehouse including column and column type information, the serializers and deserializers necessary to read and write data, and the corresponding files where the data is stored. See [What is a metastore?](https://docs.databricks.com/lakehouse/data-objects.html#metastore) \nEvery Databricks deployment has a central Hive metastore accessible by all clusters to persist table metadata. You also have the option to use an existing [external Hive metastore](https://docs.databricks.com/archive/external-metastores/external-hive-metastore.html). \n### Visualization \nA graphical presentation of the result of running a query. See [Visualizations in Databricks notebooks](https://docs.databricks.com/visualizations/index.html).\n\n", "chunk_id": "3bf992bf6355da511fee226c9b6c2fe2", "url": "https://docs.databricks.com/getting-started/concepts.html"} +{"chunked_text": "# What is Databricks?\n### Databricks concepts\n#### Computation management\n\nThis section describes concepts that you need to know to run computations in Databricks. \n### Cluster \nA set of computation resources and configurations on which you run notebooks and jobs. There are two types of clusters: all-purpose and job. See [Compute](https://docs.databricks.com/compute/index.html). \n* You create an *all-purpose cluster* using the UI, CLI, or REST API. You can manually terminate and restart an all-purpose cluster. Multiple users can share such clusters to do collaborative interactive analysis.\n* The Databricks job scheduler creates *a job cluster* when you run a [job](https://docs.databricks.com/workflows/jobs/create-run-jobs.html) on a *new job cluster* and terminates the cluster when the job is complete. You *cannot* restart an job cluster. \n### Pool \nA set of idle, ready-to-use instances that reduce cluster start and auto-scaling times. When attached to a pool, a cluster allocates its driver and worker nodes from the pool. See [Pool configuration reference](https://docs.databricks.com/compute/pools.html). \nIf the pool does not have sufficient idle resources to accommodate the cluster\u2019s request, the pool expands by allocating new instances from the instance provider. When an attached cluster is terminated, the instances it used\nare returned to the pool and can be reused by a different cluster. \n### Databricks runtime \nThe set of core components that run on the clusters managed by Databricks. See [Compute](https://docs.databricks.com/compute/index.html).\\* Databricks has the following runtimes: \n* [Databricks Runtime](https://docs.databricks.com/release-notes/runtime/index.html) includes Apache Spark but also adds a number of components and updates that substantially improve the usability, performance, and security of big data analytics.\n* [Databricks Runtime for Machine Learning](https://docs.databricks.com/machine-learning/index.html) is built on Databricks Runtime and provides prebuilt machine learning infrastructure that is integrated with all of the capabilities of the Databricks workspace. It contains multiple popular libraries, including TensorFlow, Keras, PyTorch, and XGBoost. \n### Workflows \nFrameworks to develop and run data processing pipelines: \n* [Jobs](https://docs.databricks.com/workflows/index.html#what-is-jobs): A non-interactive mechanism for running a notebook or library either immediately or on a scheduled basis.\n* [Delta Live Tables](https://docs.databricks.com/delta-live-tables/index.html): A framework for building reliable, maintainable, and testable data processing pipelines. \nSee [Introduction to Databricks Workflows](https://docs.databricks.com/workflows/index.html). \n### Workload \nDatabricks identifies two types of workloads subject to different [pricing](https://databricks.com/product/pricing) schemes: data engineering (job) and data analytics (all-purpose). \n* **Data engineering** An (automated) workload runs on *a job cluster* which the Databricks job scheduler creates for each workload.\n* **Data analytics** An (interactive) workload runs on an *all-purpose cluster*. Interactive workloads typically run commands within a Databricks [notebook](https://docs.databricks.com/notebooks/index.html). However, running a *job* on an *existing all-purpose* cluster is also treated as an interactive workload. \n### Execution context \nThe state for a read\u2013eval\u2013print loop (REPL) environment for each supported programming language. The languages supported are Python, R, Scala, and SQL.\n\n", "chunk_id": "3788a81afd7901e95ee5bfeec8c3388d", "url": "https://docs.databricks.com/getting-started/concepts.html"} +{"chunked_text": "# What is Databricks?\n### Databricks concepts\n#### Machine learning\n\n[Machine Learning](https://docs.databricks.com/machine-learning/index.html) on Databricks is an integrated end-to-end environment incorporating managed services for experiment tracking, model training, feature development and management, and feature and model serving. \n### Experiments \nThe main unit of organization for tracking machine learning model development. See [Organize training runs with MLflow experiments](https://docs.databricks.com/mlflow/experiments.html). Experiments organize, display, and control access to individual [logged runs of model training code](https://docs.databricks.com/mlflow/tracking.html). \n### Feature Store \nA centralized repository of features. See [What is a feature store?](https://docs.databricks.com/machine-learning/feature-store/index.html) Feature Store enables feature sharing and discovery across your organization and also ensures that the same feature computation code is used for model training and inference. \n### Models & model registry \nA [trained machine learning or deep learning model](https://docs.databricks.com/machine-learning/train-model/index.html) that has been registered in [Model Registry](https://docs.databricks.com/machine-learning/manage-model-lifecycle/index.html).\n\n", "chunk_id": "3497b59ee72f32ed852ae57ad0214510", "url": "https://docs.databricks.com/getting-started/concepts.html"} +{"chunked_text": "# What is Databricks?\n### Databricks concepts\n#### SQL\n\n### SQL REST API \nAn interface that allows you to automate tasks on SQL objects. See [SQL API](https://docs.databricks.com/api/workspace/statementexecution). \n### Dashboard \nA presentation of data visualizations and commentary. See [Dashboards](https://docs.databricks.com/dashboards/index.html). For legacy dashboards, see [Legacy dashboards](https://docs.databricks.com/sql/user/dashboards/index.html). \n### SQL queries \nThis section describes concepts that you need to know to run SQL queries in Databricks. \n* **[Query](https://docs.databricks.com/sql/user/queries/index.html)**: A valid SQL statement.\n* **[SQL warehouse](https://docs.databricks.com/compute/sql-warehouse/index.html)**: A computation resource on which you execute SQL queries.\n* **[Query history](https://docs.databricks.com/sql/user/queries/query-history.html)**: A list of executed queries and their performance characteristics.\n\n", "chunk_id": "ba66ced3b9647d66afd18bd0f9cd584d", "url": "https://docs.databricks.com/getting-started/concepts.html"} +{"chunked_text": "# What is Databricks?\n## What is a data lakehouse?\n#### Data objects in the Databricks lakehouse\n\nThe Databricks lakehouse organizes data stored with Delta Lake in cloud object storage with familiar relations like database, tables, and views. This model combines many of the benefits of an enterprise data warehouse with the scalability and flexibility of a data lake. Learn more about how this model works, and the relationship between object data and metadata so that you can apply best practices when designing and implementing Databricks lakehouse for your organization.\n\n", "chunk_id": "32fdc595b20eadcd31e012ec3346ef37", "url": "https://docs.databricks.com/lakehouse/data-objects.html"} +{"chunked_text": "# What is Databricks?\n## What is a data lakehouse?\n#### Data objects in the Databricks lakehouse\n##### What data objects are in the Databricks lakehouse?\n\nThe Databricks lakehouse architecture combines data stored with the Delta Lake protocol in cloud object storage with metadata registered to a [metastore](https://docs.databricks.com/lakehouse/data-objects.html#metastore). There are five primary objects in the Databricks lakehouse: \n* **[Catalog](https://docs.databricks.com/lakehouse/data-objects.html#catalog)**: a grouping of databases.\n* **[Database](https://docs.databricks.com/lakehouse/data-objects.html#database)** or schema: a grouping of objects in a catalog. Databases contain tables, views, and functions.\n* **[Table](https://docs.databricks.com/lakehouse/data-objects.html#table)**: a collection of rows and columns stored as data files in object storage.\n* **[View](https://docs.databricks.com/lakehouse/data-objects.html#view)**: a saved query typically against one or more tables or data sources.\n* **[Function](https://docs.databricks.com/lakehouse/data-objects.html#function)**: saved logic that returns a scalar value or set of rows. \n![Unity Catalog object model diagram](https://docs.databricks.com/_images/object-model.png) \nFor information on securing objects with Unity Catalog, see [securable objects model](https://docs.databricks.com/data-governance/unity-catalog/index.html#object-model).\n\n", "chunk_id": "7686ce4c0a65b8dcae1cff4b474f7ea6", "url": "https://docs.databricks.com/lakehouse/data-objects.html"} +{"chunked_text": "# What is Databricks?\n## What is a data lakehouse?\n#### Data objects in the Databricks lakehouse\n##### What is a metastore?\n\nThe metastore contains all of the metadata that defines data objects in the lakehouse. Databricks provides the following metastore options: \n* **[Unity Catalog metastore](https://docs.databricks.com/data-governance/unity-catalog/index.html)**: Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities. You create Unity Catalog metastores at the Databricks account level, and a single metastore can be used across multiple workspaces. \nEach Unity Catalog metastore is configured with a root storage location in an S3 bucket in your AWS account. This storage location is used by default for storing data for managed tables. \nIn Unity Catalog, data is secure by default. Initially, users have no access to data in a metastore. Access can be granted by either a metastore admin or the owner of an object. Securable objects in Unity Catalog are hierarchical and privileges are inherited downward. Unity Catalog offers a single place to administer data access policies. Users can access data in Unity Catalog from any workspace that the metastore is attached to. For more information, see [Manage privileges in Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/index.html).\n* **Built-in Hive metastore (legacy)**: Each Databricks workspace includes a built-in Hive metastore as a managed service. An instance of the metastore deploys to each cluster and securely accesses metadata from a central repository for each customer workspace. \nThe Hive metastore provides a less centralized data governance model than Unity Catalog. By default, a cluster allows all users to access all data managed by the workspace\u2019s built-in Hive metastore unless table access control is enabled for that cluster. For more information, see [Hive metastore table access control (legacy)](https://docs.databricks.com/data-governance/table-acls/index.html). \nTable access controls are not stored at the account-level, and therefore they must be configured separately for each workspace. To take advantage of the centralized and streamlined data governance model provided by Unity Catalog, Databricks recommends that you [upgrade the tables managed by your workspace\u2019s Hive metastore to the Unity Catalog metastore](https://docs.databricks.com/data-governance/unity-catalog/migrate.html). \n* **[External Hive metastore (legacy)](https://docs.databricks.com/archive/external-metastores/index.html)**: You can also bring your own metastore to Databricks. Databricks clusters can connect to existing external Apache Hive metastores or the AWS Glue Data Catalog. You can use table access control to manage permissions in an external metastore. Table access controls are not stored in the external metastore, and therefore they must be configured separately for each workspace. Databricks recommends that you use Unity Catalog instead for its simplicity and account-centered governance model. \nRegardless of the metastore that you use, Databricks stores all table data in object storage in your cloud account.\n\n", "chunk_id": "67cf5a353c9122371c97643a61b9fe2d", "url": "https://docs.databricks.com/lakehouse/data-objects.html"} +{"chunked_text": "# What is Databricks?\n## What is a data lakehouse?\n#### Data objects in the Databricks lakehouse\n##### What is a catalog?\n\nA catalog is the highest abstraction (or coarsest grain) in the Databricks lakehouse relational model. Every database will be associated with a catalog. Catalogs exist as objects within a metastore. \nBefore the introduction of Unity Catalog, Databricks used a two-tier namespace. Catalogs are the third tier in the Unity Catalog namespacing model: \n```\ncatalog_name.database_name.table_name\n\n``` \nThe built-in Hive metastore only supports a single catalog, `hive_metastore`.\n\n#### Data objects in the Databricks lakehouse\n##### What is a database?\n\nA database is a collection of data objects, such as tables or views (also called \u201crelations\u201d), and functions. In Databricks, the terms \u201cschema\u201d and \u201cdatabase\u201d are used interchangeably (whereas in many relational systems, a database is a collection of schemas). \nDatabases will always be associated with a location on cloud object storage. You can optionally specify a `LOCATION` when registering a database, keeping in mind that: \n* The `LOCATION` associated with a database is always considered a managed location.\n* Creating a database does not create any files in the target location.\n* The `LOCATION` of a database will determine the default location for data of all tables registered to that database.\n* Successfully dropping a database will recursively drop all data and files stored in a managed location. \nThis interaction between locations managed by database and data files is very important. To avoid accidentally deleting data: \n* Do not share database locations across multiple database definitions.\n* Do not register a database to a location that already contains data.\n* To manage data life cycle independently of database, save data to a location that is not nested under any database locations.\n\n", "chunk_id": "29694de643868d78423401e0a719ea4a", "url": "https://docs.databricks.com/lakehouse/data-objects.html"} +{"chunked_text": "# What is Databricks?\n## What is a data lakehouse?\n#### Data objects in the Databricks lakehouse\n##### What is a table?\n\nA Databricks table is a collection of structured data. A Delta table stores data as a directory of files on cloud object storage and registers table metadata to the metastore within a catalog and schema. As Delta Lake is the default format for tables created in Databricks, all tables created in Databricks are Delta tables, by default. Because Delta tables store data in cloud object storage and provide references to data through a metastore, users across an organization can access data using their preferred APIs; on Databricks, this includes SQL, Python, PySpark, Scala, and R. \nNote that it is possible to create tables on Databricks that are not Delta tables. These tables are not backed by Delta Lake, and will not provide the ACID transactions and optimized performance of Delta tables. Tables falling into this category include tables registered against data in external systems and tables registered against other file formats in the data lake. See [Connect to data sources](https://docs.databricks.com/connect/index.html). \nThere are two kinds of tables in Databricks, [managed](https://docs.databricks.com/lakehouse/data-objects.html#managed-table) and [unmanaged](https://docs.databricks.com/lakehouse/data-objects.html#unmanaged-table) (or external) tables. \nNote \nThe [Delta Live Tables](https://docs.databricks.com/lakehouse/data-objects.html#dlt) distinction between live tables and streaming live tables is not enforced from the table perspective. \n### What is a managed table? \nDatabricks manages both the metadata and the data for a managed table; when you drop a table, you also delete the underlying data. Data analysts and other users that mostly work in SQL may prefer this behavior. Managed tables are the default when creating a table. The data for a managed table resides in the `LOCATION` of the database it is registered to. This managed relationship between the data location and the database means that in order to move a managed table to a new database, you must rewrite all data to the new location. \nThere are a number of ways to create managed tables, including: \n```\nCREATE TABLE table_name AS SELECT * FROM another_table\n\n``` \n```\nCREATE TABLE table_name (field_name1 INT, field_name2 STRING)\n\n``` \n```\ndf.write.saveAsTable(\"table_name\")\n\n``` \n### What is an unmanaged table? \nDatabricks only manages the metadata for unmanaged (external) tables; when you drop a table, you do not affect the underlying data. Unmanaged tables will always specify a `LOCATION` during table creation; you can either register an existing directory of data files as a table or provide a path when a table is first defined. Because data and metadata are managed independently, you can rename a table or register it to a new database without needing to move any data. Data engineers often prefer unmanaged tables and the flexibility they provide for production data. \nThere are a number of ways to create unmanaged tables, including: \n```\nCREATE TABLE table_name\nUSING DELTA\nLOCATION '/path/to/existing/data'\n\n``` \n```\nCREATE TABLE table_name\n(field_name1 INT, field_name2 STRING)\nLOCATION '/path/to/empty/directory'\n\n``` \n```\ndf.write.option(\"path\", \"/path/to/empty/directory\").saveAsTable(\"table_name\")\n\n```\n\n", "chunk_id": "f89802c34445e1514db2fb0779106947", "url": "https://docs.databricks.com/lakehouse/data-objects.html"} +{"chunked_text": "# What is Databricks?\n## What is a data lakehouse?\n#### Data objects in the Databricks lakehouse\n##### What is a view?\n\nA view stores the text for a query typically against one or more data sources or tables in the metastore. In Databricks, a view is equivalent to a Spark DataFrame persisted as an object in a database. Unlike DataFrames, you can query views from any part of the Databricks product, assuming you have permission to do so. Creating a view does not process or write any data; only the query text is registered to the metastore in the associated database.\n\n#### Data objects in the Databricks lakehouse\n##### What is a temporary view?\n\nA temporary view has a limited scope and persistence and is not registered to a schema or catalog. The lifetime of a temporary view differs based on the environment you\u2019re using: \n* In notebooks and jobs, temporary views are scoped to the notebook or script level. They cannot be referenced outside of the notebook in which they are declared, and will no longer exist when the notebook detaches from the cluster.\n* In Databricks SQL, temporary views are scoped to the query level. Multiple statements within the same query can use the temp view, but it cannot be referenced in other queries, even within the same dashboard.\n* Global temporary views are scoped to the cluster level and can be shared between notebooks or jobs that share computing resources. Databricks recommends using views with appropriate table ACLs instead of global temporary views.\n\n#### Data objects in the Databricks lakehouse\n##### What is a function?\n\nFunctions allow you to associate user-defined logic with a database. Functions can return either scalar values or sets of rows. Functions are used to aggregate data. Databricks allows you to save functions in various languages depending on your execution context, with SQL being broadly supported. You can use functions to provide managed access to custom logic across a variety of contexts on the Databricks product.\n\n", "chunk_id": "082dce3d53440a5b25efac853da4e8c9", "url": "https://docs.databricks.com/lakehouse/data-objects.html"} +{"chunked_text": "# What is Databricks?\n## What is a data lakehouse?\n#### Data objects in the Databricks lakehouse\n##### How do relational objects work in Delta Live Tables?\n\n[Delta Live Tables](https://docs.databricks.com/delta-live-tables/index.html) uses declarative syntax to define and manage DDL, DML, and infrastructure deployment. Delta Live Tables uses the concept of a \u201cvirtual schema\u201d during logic planning and execution. Delta Live Tables can interact with other databases in your Databricks environment, and Delta Live Tables can publish and persist tables for querying elsewhere by specifying a target database in the pipeline configuration settings. \nAll tables created in Delta Live Tables are Delta tables. When using Unity Catalog with Delta Live Tables, all tables are Unity Catalog managed tables. If Unity Catalog is not active, tables can be declared as either managed or unmanaged tables. \nWhile views can be declared in Delta Live Tables, these should be thought of as temporary views scoped to the pipeline. Temporary tables in Delta Live Tables are a unique concept: these tables persist data to storage but do not publish data to the target database. \nSome operations, such as `APPLY CHANGES INTO`, will register both a table and view to the database; the table name will begin with an underscore (`_`) and the view will have the table name declared as the target of the `APPLY CHANGES INTO` operation. The view queries the corresponding hidden table to materialize the results.\n\n", "chunk_id": "907faffccfc0d6d76df5c915da74c441", "url": "https://docs.databricks.com/lakehouse/data-objects.html"} +{"chunked_text": "# Introduction to the well-architected data lakehouse\n### Download lakehouse reference architectures\n\nThis article covers architectural guidance for the lakehouse in terms of data source, ingestion, transformation, querying and processing, serving, analysis/output, and storage. \nEach reference architecture has a downloadable PDF in 11 x 17 (A3) format.\n\n### Download lakehouse reference architectures\n#### Generic reference architecture\n\n![Generic reference architecture of the lakehouse](https://docs.databricks.com/_images/ref-arch-overview-generic.png) \n**[Download: Generic lakehouse reference architecture for Databricks (PDF)](https://docs.databricks.com/_extras/documents/reference-architecture-databricks-generic.pdf)**\n\n", "chunk_id": "1dcecc5d581621e1e492b2d7de321673", "url": "https://docs.databricks.com/lakehouse-architecture/reference.html"} +{"chunked_text": "# Introduction to the well-architected data lakehouse\n### Download lakehouse reference architectures\n#### Organization of the reference architectures\n\nThe reference architecture is structured along the swim lanes *Source*, *Ingest*, *Transform*, *Query and Process*, *Serve*, *Analysis*, and *Storage*: \n* **Source** \nThe architecture distinguishes between semi-structured and unstructured data (sensors and IoT, media, files/logs), and structured data (RDBMS, business applications). SQL sources (RDBMS) can also be integrated into the lakehouse and [Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/index.html) without ETL through [lakehouse federation](https://docs.databricks.com/query-federation/index.html). In addition, data might be loaded from other cloud providers.\n* **Ingest** \nData can be ingested into the lakehouse via batch or streaming: \n+ Files delivered to cloud storage can be loaded directly using the Databricks [Auto Loader](https://docs.databricks.com/ingestion/auto-loader/index.html).\n+ For batch ingestion of data from enterprise applications into [Delta Lake](https://docs.databricks.com/delta/index.html), the [Databricks lakehouse](https://docs.databricks.com/lakehouse/index.html) relies on [partner ingest tools](https://docs.databricks.com/partner-connect/ingestion.html) with specific adapters for these systems of record.\n+ Streaming events can be ingested directly from event streaming systems such as Kafka using Databricks [Structured Streaming](https://docs.databricks.com/structured-streaming/index.html). Streaming sources can be sensors, IoT, or [change data capture](https://docs.databricks.com/delta-live-tables/cdc.html) processes.\n* **Storage** \nData is typically stored in the cloud storage system where the ETL pipelines use the [medallion architecture](https://docs.databricks.com/lakehouse/medallion.html) to store data in a curated way as [Delta files/tables](https://delta.io/).\n* **Transform** and **Query and process** \nThe Databricks lakehouse uses its engines [Apache Spark](https://docs.databricks.com/spark/index.html) and [Photon](https://docs.databricks.com/compute/photon.html) for all transformations and queries. \nDue to its simplicity, the declarative framework DLT ([Delta Live Tables](https://docs.databricks.com/delta-live-tables/index.html)) is a good choice for building reliable, maintainable, and testable data processing pipelines. \nPowered by Apache Spark and Photon, the Databricks Data Intelligence Platform supports both types of workloads: SQL queries via [SQL warehouses](https://docs.databricks.com/compute/sql-warehouse/index.html), and SQL, Python and Scala workloads via workspace [clusters](https://docs.databricks.com/compute/index.html). \nFor data science (ML Modeling and [Gen AI](https://docs.databricks.com/generative-ai/generative-ai.html)), the Databricks [AI and Machine Learning platform](https://docs.databricks.com/machine-learning/index.html) provides specialized ML runtimes for [AutoML](https://docs.databricks.com/machine-learning/automl/index.html) and for coding ML jobs. All data science and [MLOps workflows](https://docs.databricks.com/machine-learning/mlops/mlops-workflow.html) are best supported by [MLflow](https://docs.databricks.com/mlflow/index.html).\n* **Serve** \nFor DWH and BI use cases, the Databricks lakehouse provides [Databricks SQL](https://docs.databricks.com/sql/index.html), the data warehouse powered by [SQL warehouses](https://docs.databricks.com/compute/sql-warehouse/index.html) and [serverless SQL warehouses](https://docs.databricks.com/admin/sql/serverless.html). \nFor machine learning, [model serving](https://docs.databricks.com/machine-learning/model-serving/index.html) is a scalable, real-time, enterprise-grade model serving capability hosted in the Databricks control plane. \nOperational databases: [External systems](https://docs.databricks.com/connect/external-systems/index.html), such as operational databases, can be used to store and deliver final data products to user applications. \nCollaboration: Business partners get secure access to the data they need through [Delta Sharing](https://docs.databricks.com/data-sharing/index.html). Based on Delta Sharing, the [Databricks Marketplace](https://docs.databricks.com/marketplace/index.html) is an open forum for exchanging data products.\n* **Analysis** \nThe final business applications are in this swim lane. Examples include custom clients such as AI applications connected to [Databricks Model Serving](https://docs.databricks.com/machine-learning/model-serving/index.html) for real-time inference or applications that access data pushed from the lakehouse to an operational database. \nFor BI use cases, analysts typically use [BI tools to access the data warehouse](https://docs.databricks.com/partner-connect/bi.html). SQL developers can additionally use the [Databricks SQL Editor](https://docs.databricks.com/sql/user/sql-editor/index.html) (not shown in the diagram) for queries and dashboarding. \nThe Data Intelligence Platform also offers [dashboards](https://docs.databricks.com/dashboards/index.html) to build data visualizations and share insights.\n\n", "chunk_id": "3c26fcc9f968753eef0d1694c764bcc3", "url": "https://docs.databricks.com/lakehouse-architecture/reference.html"} +{"chunked_text": "# Introduction to the well-architected data lakehouse\n### Download lakehouse reference architectures\n#### Capabilities for your workloads\n\nIn addition, the Databricks lakehouse comes with management capabilities that support all workloads: \n* **Data and AI governance** \nThe central data and AI governance system in the Databricks Data Intelligence Platform is [Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/index.html). Unity Catalog provides a single place to manage data access policies that apply across all workspaces and supports all assets created or used in the lakehouse, such as tables, volumes, features ([feature store](https://docs.databricks.com/machine-learning/feature-store/index.html)), and models ([model registry](https://docs.databricks.com/machine-learning/manage-model-lifecycle/index.html)). Unity Catalog can also be used to [capture runtime data lineage](https://docs.databricks.com/data-governance/unity-catalog/data-lineage.html) across queries run on Databricks. \nDatabricks [lakehouse monitoring](https://docs.databricks.com/lakehouse-monitoring/index.html) allows you to monitor the quality of the data in all of the tables in your account. It can also track the performance of [machine learning models and model-serving endpoints](https://docs.databricks.com/machine-learning/model-serving/monitor-diagnose-endpoints.html). \nFor Observability, [system tables](https://docs.databricks.com/admin/system-tables/index.html) is a Databricks-hosted analytical store of your account\u2019s operational data. System tables can be used for historical observability across your account.\n* **Data intelligence engine** \nThe Databricks Data Intelligence Platform allows your entire organization to use data and AI. It is powered by [DatabricksIQ](https://docs.databricks.com/databricksiq/index.html) and combines generative AI with the unification benefits of a lakehouse to understand the unique semantics of your data. \nThe [Databricks Assistant](https://docs.databricks.com/notebooks/databricks-assistant-faq.html) is available in Databricks notebooks, SQL editor, and file editor as a context-aware AI assistant for developers. \n* **Orchestration** \n[Databricks Workflows](https://docs.databricks.com/workflows/index.html) orchestrate data processing, machine learning, and analytics pipelines in the Databricks Data Intelligence Platform. Workflows has fully managed orchestration services integrated into the Databricks platform, including [Databricks Jobs](https://docs.databricks.com/workflows/index.html#what-is-databricks-jobs) to run non-interactive code in your Databricks workspace and [Delta Live Tables](https://docs.databricks.com/delta-live-tables/index.html) to build reliable and maintainable ETL pipelines.\n\n", "chunk_id": "95c2d07889b26b243d43b632ba0fe699", "url": "https://docs.databricks.com/lakehouse-architecture/reference.html"} +{"chunked_text": "# Introduction to the well-architected data lakehouse\n### Download lakehouse reference architectures\n#### The Data Intelligence Platform reference architecture on AWS\n\nThe AWS reference architecture is derived from the [generic reference architecture](https://docs.databricks.com/lakehouse-architecture/reference.html#gen-ref-arch) by adding AWS-specific services for the Source, Ingest, Serve, Analysis, and Storage elements. \n![Reference architecture for the Databricks lakehouse on AWS](https://docs.databricks.com/_images/ref-arch-overview-aws.png) \n**[Download: Reference architecture for the Databricks lakehouse on AWS](https://docs.databricks.com/_extras/documents/reference-architecture-databricks-on-aws.pdf)** \nThe AWS reference architecture shows the following AWS-specific services for Ingest, Storage, Serve, and Analysis/Output: \n* Amazon Redshift as a source for Lakehouse Federation\n* Amazon AppFlow and AWS Glue for batch ingest\n* AWS IoT Core, Amazon Kinesis, and AWS DMS for streaming ingest\n* Amazon S3 as the object storage\n* Amazon RDS and Amazon DynamoDB as operational databases\n* Amazon QuickSight as BI tool\n* Amazon Bedrock as a unified API to foundation models from leading AI startups and Amazon \nNote \n* This view of the reference architecture focuses only on AWS services and the Databricks lakehouse. The lakehouse on Databricks is an open platform that integrates with a [large ecosystem of partner tools](https://docs.databricks.com/integrations/index.html).\n* The cloud provider services shown are not exhaustive. They are selected to illustrate the concept.\n\n", "chunk_id": "7855e85c3b1efe6a57c0c339e7154b6c", "url": "https://docs.databricks.com/lakehouse-architecture/reference.html"} +{"chunked_text": "# Introduction to the well-architected data lakehouse\n### Download lakehouse reference architectures\n#### Use case: Batch ETL\n\n![Batch ETL reference architecture for Databricks on AWS](https://docs.databricks.com/_images/aws-ref-arch-batch.png) \n**[Download: Batch ETL reference architecture for Databricks on AWS](https://docs.databricks.com/_extras/documents/reference-use-case-batch-for-aws.pdf)** \nIngest tools use source-specific adapters to read data from the source and then either store it in the cloud storage from where Auto Loader can read it, or call Databricks directly (for example, with partner ingest tools integrated into the Databricks lakehouse). To load the data, the Databricks ETL and processing engine - via DLT - runs the queries. Single or multitask jobs can be orchestrated by Databricks workflows and governed by Unity Catalog (access control, audit, lineage, and so on). If low-latency operational systems require access to specific golden tables, they can be exported to an operational database such as an RDBMS or key-value store at the end of the ETL pipeline.\n\n### Download lakehouse reference architectures\n#### Use case: Streaming and change data capture (CDC)\n\n![Spark structured streaming architecture on Databricks on AWS](https://docs.databricks.com/_images/aws-ref-arch-streaming-cdc.png) \n**[Download: Spark structured streaming architecture for Databricks on AWS](https://docs.databricks.com/_extras/documents/reference-use-case-streaming-cdc-for-aws.pdf)** \nThe Databricks ETL engine Spark Structured Streaming to read from event queues such as Apache Kafka or AWS Kinesis. The downstream steps follow the approach of the Batch use case above. \nReal-time change data capture (CDC) typically uses an event queue to store the extracted events. From there, the use case follows the streaming use case. \nIf CDC is done in batch where the extracted records are stored in cloud storage first, then Databricks Autoloader can read them and the use case follows Batch ETL.\n\n", "chunk_id": "cd55e6c23d0dd2afdefc6bbfb917d6bf", "url": "https://docs.databricks.com/lakehouse-architecture/reference.html"} +{"chunked_text": "# Introduction to the well-architected data lakehouse\n### Download lakehouse reference architectures\n#### Use case: Machine learning and AI\n\n![Machine learning and AI reference architecture for Databricks on AWS](https://docs.databricks.com/_images/aws-ref-arch-ai.png) \n**[Download: Machine learning and AI reference architecture for Databricks on AWS](https://docs.databricks.com/_extras/documents/reference-use-case-ai-for-aws.pdf)** \nFor machine learning, the Databricks Data Intelligence Platform provides Mosaic AI, which comes with state-of-the-art machine and deep learning libraries. It provides capabilities such as Feature Store and model registry (both integrated into Unity Catalog), low-code features with AutoML, and MLflow integration into the data science lifecycle. \nAll data science-related assets (tables, features, and models) are governed by Unity Catalog and data scientists can use Databricks Workflows to orchestrate their jobs. \nFor deploying models in a scalable and enterprise-grade way, use the MLOps capabilities to publish the models in model serving.\n\n### Download lakehouse reference architectures\n#### Use case: Retrieval Augmented Generation (Gen AI)\n\n![Gen AI RAG reference architecture for Databricks on AWS](https://docs.databricks.com/_images/aws-ref-arch-ai-rag.png) \n**[Download: Gen AI RAG reference architecture for Databricks on AWS](https://docs.databricks.com/_extras/documents/reference-use-case-gen-ai-rag-for-aws.pdf)** \nFor generative AI use cases, Mosaic AI comes with state-of-the-art libraries and specific Gen AI capabilities from prompt engineering to fine-tuning of existing models and pre-training from scratch. The above architecture shows an example of how vector search can be integrated to create a RAG (retrieval augmented generation) AI application. \nFor deploying models in a scalable and enterprise-grade way, use the MLOps capabilities to publish the models in model serving.\n\n", "chunk_id": "9d23c9f42a198d5e5b15ae0d8831798e", "url": "https://docs.databricks.com/lakehouse-architecture/reference.html"} +{"chunked_text": "# Introduction to the well-architected data lakehouse\n### Download lakehouse reference architectures\n#### Use case: BI and SQL analytics\n\n![BI and SQL analytics reference architecture for Databricks on AWS](https://docs.databricks.com/_images/aws-ref-arch-bi.png) \n**[Download: BI and SQL analytics reference architecture for Databricks on AWS](https://docs.databricks.com/_extras/documents/reference-use-case-bi-for-aws.pdf)** \nFor BI use cases, business analysts can use dashboards, the Databricks SQL editor or specific BI tools such as Tableau or Amazon QuickSight. In all cases, the engine is Databricks SQL (serverless or non-serverless) and data discovery, exploration, lineage, and access control are provided by Unity Catalog.\n\n### Download lakehouse reference architectures\n#### Use case: Lakehouse federation\n\n![Lakehouse federation reference architecture for Databricks on AWS](https://docs.databricks.com/_images/aws-ref-arch-federation.png) \n**[Download: Lakehouse federation reference architecture for Databricks on AWS](https://docs.databricks.com/_extras/documents/reference-use-case-federation-for-aws.pdf)** \nLakehouse federation allows external data SQL databases (such as MySQL, Postgres, or Redshift) to be integrated with Databricks. \nAll workloads (AI, DWH, and BI) can benefit from this without the need to ETL the data into object storage first. The external source catalog is mapped into the Unity catalog and fine-grained access control can be applied to access via the Databricks platform.\n\n", "chunk_id": "946f4bc05087e68b6e3ed092c3a448b0", "url": "https://docs.databricks.com/lakehouse-architecture/reference.html"} +{"chunked_text": "# Introduction to the well-architected data lakehouse\n### Download lakehouse reference architectures\n#### Use case: Enterprise data sharing\n\n![Enterprise data sharing reference architecture for Databricks on AWS](https://docs.databricks.com/_images/aws-ref-arch-collaboration.png) \n**[Download: Enterprise data sharing reference architecture for Databricks on AWS](https://docs.databricks.com/_extras/documents/reference-use-case-collaboration-for-aws.pdf)** \nEnterprise-grade data sharing is provided by Delta Sharing. It provides direct access to data in the object store secured by Unity Catalog, and Databricks Marketplace is an open forum for exchanging data products.\n\n", "chunk_id": "250fdc9a37afbc6ffdb8c535b3bebcb2", "url": "https://docs.databricks.com/lakehouse-architecture/reference.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n### Error classes in Databricks\n##### DC\\_GA4\\_RAW\\_DATA\\_ERROR error class\n\n[SQLSTATE: KD000](https://docs.databricks.com/error-messages/sqlstates.html#class-kd-datasource-specific-errors) \nError happened in GA4 raw data connector calls, errorCode: ``.\n\n##### DC\\_GA4\\_RAW\\_DATA\\_ERROR error class\n###### JOBS\\_API\\_READ\\_ROWS\\_FAILED\n\nFailed to read rows from table `` using jobs API.\n\n##### DC\\_GA4\\_RAW\\_DATA\\_ERROR error class\n###### JOBS\\_API\\_SQL\\_EXECUTION\\_FAILED\n\nFailed to execute jobs API for the SQL query ``.\n\n##### DC\\_GA4\\_RAW\\_DATA\\_ERROR error class\n###### JOBS\\_API\\_TEMPORARY\\_TABLE\\_FETCH\\_FAILED\n\nFailed to fetch temporary table ``.\n\n##### DC\\_GA4\\_RAW\\_DATA\\_ERROR error class\n###### STORAGE\\_API\\_CREATE\\_READ\\_SESSION\\_FAILED\n\nFailed to create read session while reading incremental table `` for table ``\n\n##### DC\\_GA4\\_RAW\\_DATA\\_ERROR error class\n###### STORAGE\\_API\\_READ\\_ROWS\\_FAILED\n\nFailed to read rows from storage API while reading incremental table `` for table ``\n\n##### DC\\_GA4\\_RAW\\_DATA\\_ERROR error class\n###### STORAGE\\_API\\_READ\\_ROWS\\_RESPONSE\\_FAILED\n\nFailed to read rows response while reading incremental table `` for table ``\n\n", "chunk_id": "d97c6802b8440b7ca97dac3971244380", "url": "https://docs.databricks.com/error-messages/dc-ga4-raw-data-error-error-class.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `input_file_name` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns the name of the file being read, or empty string if not available. \nThis function is not available on Unity Catalog. \nIn Databricks SQL and Databricks Runtime 13.3 LTS and above this function is deprecated. Please use [\\_metadata.file\\_name](https://docs.databricks.com/ingestion/file-metadata-column.html).\n\n####### `input_file_name` function\n######## Syntax\n\n```\ninput_file_name()\n\n```\n\n####### `input_file_name` function\n######## Arguments\n\nThis function takes no arguments.\n\n####### `input_file_name` function\n######## Returns\n\nA STRING. \nIf the information is not available an empty string is returned. \nThe function is non-deterministic.\n\n####### `input_file_name` function\n######## Examples\n\n```\n> SELECT input_file_name();\n\n```\n\n####### `input_file_name` function\n######## Related functions\n\n* [\\_metadata](https://docs.databricks.com/ingestion/file-metadata-column.html)\n\n", "chunk_id": "44a944cd976fb4a7c72d809dbaab8dd2", "url": "https://docs.databricks.com/sql/language-manual/functions/input_file_name.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `last_day` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns the last day of the month that the date belongs to.\n\n####### `last_day` function\n######## Syntax\n\n```\nlast_day(expr)\n\n```\n\n####### `last_day` function\n######## Arguments\n\n* `expr`: A DATE expression.\n\n####### `last_day` function\n######## Returns\n\nA DATE.\n\n####### `last_day` function\n######## Examples\n\n```\n> SELECT last_day('2009-01-12');\n2009-01-31\n\n```\n\n####### `last_day` function\n######## Related functions\n\n* [next\\_day function](https://docs.databricks.com/sql/language-manual/functions/next_day.html)\n\n", "chunk_id": "b8c7ed30909762f9a4dbdbc71330d3e2", "url": "https://docs.databricks.com/sql/language-manual/functions/last_day.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### February 2018\n\nReleases are staged. Your Databricks account may not be updated until a week after the initial\nrelease date. \nNote \nWe are now providing Databricks Runtime deprecation notices in [Databricks Runtime release notes versions and compatibility](https://docs.databricks.com/release-notes/runtime/index.html).\n\n#### February 2018\n##### New line chart supports time-series data\n\n**Feb 27 - Mar 6, 2018: Version 2.66** \nA new line chart fully supports time-series data and resolves limitations with our old line chart option. The old line chart is deprecated, and we recommend that users migrate any visualizations that use the old line chart to the new one. \n![Line Chart](https://docs.databricks.com/_images/line-chart.png) \nSee [Migrate legacy line charts](https://docs.databricks.com/visualizations/legacy-charts.html) for more information.\n\n#### February 2018\n##### More visualization improvements\n\n**Feb 27 - Mar 6, 2018: Version 2.66** \nYou can now sort columns in table output and use more than 10 legend items in a chart.\n\n#### February 2018\n##### Delete job runs using Job API\n\n**Feb 27 - Mar 6, 2018: Version 2.66** \nYou can now use the Job API to delete job runs, using the new `jobs/runs/delete` endpoint. \nSee [Runs delete](https://docs.databricks.com/workflows/jobs/jobs-2.0-api.html#jobsjobsservicedeleterun) for more information.\n\n#### February 2018\n##### Bring your own S3 bucket\n\n**Feb 27 - Mar 6, 2018: Version 2.66** \nNew customers can now host their DBFS (Workspace) root, which stores account-wide assets, such as libraries, in their own AWS S3 bucket. \nSee [Configure AWS storage (legacy)](https://docs.databricks.com/archive/admin-guide/aws-storage.html) for more information.\n\n", "chunk_id": "e2dce996deb234bbb6eb2c0b951c891c", "url": "https://docs.databricks.com/release-notes/product/2018/february.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### February 2018\n##### KaTeX math rendering library updated\n\n**Feb 27 - Mar 6, 2018: Version 2.66** \nThe version of KatTeX that Databricks uses for math equation rendering was updated from 0.5.1 to 0.9.0-beta1. \nThis update introduces changes that can break expressions that were written in 0.5.1: \n* `\\xLongequal` is now `\\xlongequal` (#997)\n* `[text]color` HTML colors must be well-formed. (#827)\n* `\\llap` and `\\rlap` now render contents in math mode. Use `\\mathllap` (new) and `\\mathrlap` (new) to provide the previous behavior.\n* `\\color` and `\\textcolor` now behave as they do in LaTeX (#619) \nSee the [KaTeX release notes](https://github.com/Khan/KaTeX/releases) for more information.\n\n#### February 2018\n##### Databricks CLI: 0.5.0 release\n\n**February 27, 2018: databricks-cli 0.5.0** \nDatabricks CLI now supports commands that target the [Libraries API](https://docs.databricks.com/api/workspace/libraries). \nThe CLI also now supports multiple connection profiles. Connection profiles can be used to configure the CLI to talk\nto multiple Databricks deployments. \nSee [Databricks CLI (legacy)](https://docs.databricks.com/archive/dev-tools/cli/index.html) for more information.\n\n", "chunk_id": "8377fabc592eae6b2b7cde52467e989b", "url": "https://docs.databricks.com/release-notes/product/2018/february.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### February 2018\n##### DBUtils API library\n\n**Feb 13-20, 2018: Version 2.65** \nDatabricks provides a variety of utility APIs that let you work easily with DBFS, notebook workflows, and widgets. The `dbutils-api` library accelerates application development by\nallowing you to locally compile and run unit tests against these utility APIs before deploying your application to a Databricks cluster. \nSee [Databricks Utilities (dbutils) reference](https://docs.databricks.com/dev-tools/databricks-utils.html) for more information.\n\n#### February 2018\n##### Filter for your jobs only\n\n**Feb 13-20, 2018: Version 2.65** \nNew filters on the Jobs list let you display only the jobs you own and only the jobs you have access to. \n![Job Filters](https://docs.databricks.com/_images/jobs-filter-my.png) \nSee [Create and run Databricks Jobs](https://docs.databricks.com/workflows/jobs/create-run-jobs.html) for more information.\n\n#### February 2018\n##### Spark-submit from the Create Job page\n\n**Feb 13-20, 2018: Version 2.65** \nNow you can configure `spark-submit` parameters from the Create Job page, as well as through the REST API or CLI. \n![Spark-submit](https://docs.databricks.com/_images/jobs-spark-submit-params.png) \nSee [Create and run Databricks Jobs](https://docs.databricks.com/workflows/jobs/create-run-jobs.html) for more information.\n\n#### February 2018\n##### Select Python 3 from the Create Cluster page\n\n**Feb 13-20, 2018: Version 2.65** \nNow you can specify Python version 2 or 3 from the new Python version drop-down when you create a cluster. If you don\u2019t make a selection, Python 2 is the default. You can also, as before, create Python 3 clusters using the REST API.\n\n", "chunk_id": "fc8bcc410e327eda97c846c7d23fedfa", "url": "https://docs.databricks.com/release-notes/product/2018/february.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### February 2018\n##### Workspace UI improvements\n\n**Feb 13-20, 2018: Version 2.65** \nWe have added the ability to sort files by type (folders, notebooks, libraries) in the Workspace file browser, and the home folder always appears at the top of the Users list. \n![Workspace Sort](https://docs.databricks.com/_images/workspace-sort.png)\n\n#### February 2018\n##### Autocomplete for SQL commands and database names\n\n**Feb 13-20, 2018: Version 2.65** \nSQL cells in notebooks now provide autocompletion of SQL commands and database names.\n\n#### February 2018\n##### Serverless pools now support R\n\n**Feb 1-8, 2018: Version 2.64** \nYou can now use R in [serverless pools](https://docs.databricks.com/compute/configure.html).\n\n#### February 2018\n##### Distributed TensorFlow and Keras Libraries Support\n\n**Feb 1-8, 2018: Version 2.64** \nDatabricks now supports three frameworks for distributed TensorFlow training: Horovod, TensorFlowOnSpark, and dist-Keras. See [Distributed training](https://docs.databricks.com/machine-learning/train-model/distributed-training/index.html) for detailed documentation on installation as well as example workflows.\n\n#### February 2018\n##### XGBoost available as a Spark Package\n\n**Feb 1-8, 2018: Version 2.64** \nXGBoost\u2019s Spark integration library can now be installed on Databricks as a Spark Package from the Library UI or the REST API. Previously XGBoost required installation from source via init scripts and thus a longer cluster start-up time. See [Use XGBoost on Databricks](https://docs.databricks.com/machine-learning/train-model/xgboost.html) for more information.\n\n", "chunk_id": "03b253b816a0f5ffaf3a3256155abae7", "url": "https://docs.databricks.com/release-notes/product/2018/february.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### February 2018\n##### Table access control for SQL and Python (Beta)\n\n**Feb 1-8, 2018: Version 2.64** \nLast year, we introduced data object access control for SQL users. Today we are excited to announce the public beta release of table access control (table ACLs) for both SQL and Python users. With table access control, you can restrict access to securable objects like tables, databases, views, or functions. You can also provide fine-grained access control (to rows and columns matching specific conditions, for example) by setting permissions on derived views containing arbitrary queries. \nNote \n* This feature is in public beta\n* This feature requires Databricks Runtime 3.5+. \nSee [Hive metastore privileges and securable objects (legacy)](https://docs.databricks.com/data-governance/table-acls/object-privileges.html) for more information.\n\n", "chunk_id": "e0127acece8bde42dcd073043ba51afe", "url": "https://docs.databricks.com/release-notes/product/2018/february.html"} +{"chunked_text": "# Databricks data engineering\n## Git integration with Databricks Git folders\n### Limits & FAQ for Git integration with Databricks Git folders\n##### Errors and troubleshooting for Databricks Git folders\n\nFollow the guidance below to respond to common error messages or to troubleshoot issues with Databricks Git folders.\n\n##### Errors and troubleshooting for Databricks Git folders\n###### `Invalid credentials`\n\nTry the following: \n* Confirm that the Git integration settings (**Settings** > **Linked accounts**) are correct. \n+ You must enter both your Git provider username and token.\n* Confirm that you have selected the correct Git provider in [\\*\\*Settings\\*\\* > \\*\\*Linked accounts\\*\\*](https://docs.databricks.com/repos/repos-setup.html).\n* Ensure your personal access token or app password has the correct repo access.\n* If SSO is enabled on your Git provider, authorize your tokens for SSO.\n* Test your token with the Git command line. Replace the text strings in angle brackets: \n```\ngit clone https://:@github.com//.git\n\n```\n\n##### Errors and troubleshooting for Databricks Git folders\n###### `Secure connection...SSL problems`\n\nThis error occurs if your Git server is not accessible from Databricks. To access a private Git server get in touch with your Databricks account team \n```\n: Secure connection to could not be established because of SSL problems\n\n```\n\n##### Errors and troubleshooting for Databricks Git folders\n###### Timeout errors\n\nExpensive operations such as cloning a large repo or checking out a large branch might result in timeout errors, but the operation might complete in the background. You can also try again later if the workspace was under heavy load at the time. \nTo work with a large repo, try [sparse checkout](https://docs.databricks.com/repos/git-operations-with-repos.html#sparse).\n\n", "chunk_id": "170baa2914b35d4fb59d1380afa9e1a0", "url": "https://docs.databricks.com/repos/errors-troubleshooting.html"} +{"chunked_text": "# Databricks data engineering\n## Git integration with Databricks Git folders\n### Limits & FAQ for Git integration with Databricks Git folders\n##### Errors and troubleshooting for Databricks Git folders\n###### 404 errors\n\nIf you get a 404 error when you try to open a non-notebook file, try waiting a few minutes and then trying again. There is a delay of a few minutes between when the workspace is enabled and when the webapp picks up the configuration flag.\n\n##### Errors and troubleshooting for Databricks Git folders\n###### Detached head state\n\nA Databricks Git folder can get into the detached head state if: \n* **The remote branch is deleted**. Databricks tried to recover the uncommitted local changes on the branch by applying those changes to the default branch. If the default branch has conflicting changes, Databricks applies the changes on a snapshot of the default branch (detached head).\n* A user or service principal checked out a remote repo on a tag using the [`update repo` API](https://docs.databricks.com/api/workspace/repos/update). \nTo recover from this state: \n1. Click the `create branch` button to create a new branch based on the current commit, or use the \u201cSelect branch\u201d dropdown to check out an existing branch.\n2. Commit and push if you want to keep the changes. To discard the changes, click on the kebab under **Changes**.\n\n", "chunk_id": "7b48e8beb8259a1a806d3c75b70a5068", "url": "https://docs.databricks.com/repos/errors-troubleshooting.html"} +{"chunked_text": "# Databricks data engineering\n## Git integration with Databricks Git folders\n### Limits & FAQ for Git integration with Databricks Git folders\n##### Errors and troubleshooting for Databricks Git folders\n###### Resolve notebook name conflicts\n\nDifferent notebooks with identical or similar filenames can cause an error when you create a repo or pull request, such as `Cannot perform Git operation due to conflicting names` or `A folder cannot contain a notebook with the same name as a notebook, file, or folder (excluding file extensions).` \nA naming conflict can occur even with different file extensions. For example, these two files conflict: \n* `notebook.ipynb`\n* `notebook.py` \n![Diagram: Name conflict for notebook, file, or folder.](https://docs.databricks.com/_images/asset-name-conflict.png) \n### To fix the name conflict \n* Rename the notebook, file, or folder contributing to the error state. \n+ If this error occurs when you clone the repo, you need to rename notebooks, files, or folders in the remote Git repo.\n\n##### Errors and troubleshooting for Databricks Git folders\n###### Errors suggest recloning\n\n```\nThere was a problem with deleting folders. The repo could be in an inconsistent state and re-cloning is recommended.\n\n``` \nThis error indicates that a problem occurred while deleting folders from the repo. This could leave the repo in an inconsistent state, where folders that should have been deleted still exist. If this error occurs, Databricks recommends deleting and re-cloning the repo to reset its state.\n\n", "chunk_id": "19ef60c602685371d9b908b9c5b6ca5b", "url": "https://docs.databricks.com/repos/errors-troubleshooting.html"} +{"chunked_text": "# Databricks data engineering\n## Git integration with Databricks Git folders\n### Limits & FAQ for Git integration with Databricks Git folders\n##### Errors and troubleshooting for Databricks Git folders\n###### `No experiment...found` or MLflow UI errors\n\nYou might see a Databricks error message `No experiment for node found` or an error in MLflow when you work on an\nMLflow notebook experiment last logged to before the [3.72 platform release](https://docs.databricks.com/release-notes/product/2022/may.html#databricks-repos-fix-to-issue-with-mlflow-experiment-data-loss).\nTo resolve the error, log a new run in the notebook associated with that experiment. \nNote \nThis applies only to notebook experiments. Creation of new experiments in Git folders is [unsupported](https://docs.databricks.com/repos/limits.html#can-i-create-an-mlflow-experiment-in-a-repo).\n\n", "chunk_id": "75ee9bb5e55a05a0bd116d801bf768d2", "url": "https://docs.databricks.com/repos/errors-troubleshooting.html"} +{"chunked_text": "# Databricks data engineering\n## Git integration with Databricks Git folders\n### Limits & FAQ for Git integration with Databricks Git folders\n##### Errors and troubleshooting for Databricks Git folders\n###### Notebooks appear as modified without any visible user edits\n\nIf every line of a notebook appears modified without any user edits, the modifications may be changes in line ending characters. Databricks uses linux-style LF line ending characters and this may differ from line endings in files committed from Windows systems. \nIf your notebook shows as a modified but you can\u2019t see any obvious user edits, the \u201cmodifications\u201d may be changes to the normally invisible \u201cend of line\u201d characters. End-of-line characters can be different across operating systems and file formats. \nTo diagnose this issue, check if you have a `.gitattributes` file. If you do: \n* It must not contain `* text eol=crlf`.\n* If you are **not** using Windows as your environment, remove the setting. Both your native development environment and Databricks use Linux end-of-line characters.\n* If you **are** using Windows, change the setting to `* text=auto`. Git will now internally store all files with Linux-style line endings, but will checkout to platform-specific (such as Windows) end-of-line characters automatically. \nIf you have already committed files with Windows end-of-line characters into Git, perform the following steps: \n1. Clear any outstanding changes.\n2. Update the `.gitattributes` file with the recommendation above. Commit the change.\n3. Run `git add --renormalize`. Commit and push all changes.\n\n", "chunk_id": "85cc99a69a2a9937c9c99a321b182f9f", "url": "https://docs.databricks.com/repos/errors-troubleshooting.html"} +{"chunked_text": "# Databricks release notes\n", "chunk_id": "5490dffb38fe33c84f1abd96884b5e35", "url": "https://docs.databricks.com/release-notes/product/index.html"} +{"chunked_text": "# Databricks release notes\n### Databricks platform release notes\n\n* [May 2024](https://docs.databricks.com/release-notes/product/2024/may.html)\n+ [Compute plane outbound IP addresses must be added to a workspace IP allow list](https://docs.databricks.com/release-notes/product/2024/may.html#compute-plane-outbound-ip-addresses-must-be-added-to-a-workspace-ip-allow-list)\n+ [OAuth is supported in Lakehouse Federation for Snowflake](https://docs.databricks.com/release-notes/product/2024/may.html#oauth-is-supported-in-lakehouse-federation-for-snowflake)\n+ [Bulk move and delete workspace objects from the workspace browser](https://docs.databricks.com/release-notes/product/2024/may.html#bulk-move-and-delete-workspace-objects-from-the-workspace-browser)\n+ [New compliance and security settings APIs (Public Preview)](https://docs.databricks.com/release-notes/product/2024/may.html#new-compliance-and-security-settings-apis-public-preview)\n+ [Databricks Runtime 15.2 is GA](https://docs.databricks.com/release-notes/product/2024/may.html#databricks-runtime-152-is-ga)\n+ [New Tableau connector for Delta Sharing](https://docs.databricks.com/release-notes/product/2024/may.html#new-tableau-connector-for-delta-sharing)\n+ [New deep learning recommendation model examples](https://docs.databricks.com/release-notes/product/2024/may.html#new-deep-learning-recommendation-model-examples)\n+ [Bind storage credentials and external locations to specific workspaces (Public Preview)](https://docs.databricks.com/release-notes/product/2024/may.html#bind-storage-credentials-and-external-locations-to-specific-workspaces-public-preview)\n+ [Git folders are GA](https://docs.databricks.com/release-notes/product/2024/may.html#git-folders-are-ga)\n+ [Pre-trained models in Unity Catalog (Public Preview)](https://docs.databricks.com/release-notes/product/2024/may.html#pre-trained-models-in-unity-catalog-public-preview)\n+ [Databricks Vector Search is GA](https://docs.databricks.com/release-notes/product/2024/may.html#databricks-vector-search-is-ga)\n+ [The compliance security profile now supports AWS Graviton instance types](https://docs.databricks.com/release-notes/product/2024/may.html#the-compliance-security-profile-now-supports-aws-graviton-instance-types)\n+ [Databricks Assistant autocomplete (Public Preview)](https://docs.databricks.com/release-notes/product/2024/may.html#databricks-assistant-autocomplete-public-preview)\n+ [Meta Llama 3 support in Foundation Model Training](https://docs.databricks.com/release-notes/product/2024/may.html#meta-llama-3-support-in-foundation-model-training)\n+ [New changes to Git folder UI](https://docs.databricks.com/release-notes/product/2024/may.html#new-changes-to-git-folder-ui)\n+ [Compute now uses EBS GP3 volumes for autoscaling local storage](https://docs.databricks.com/release-notes/product/2024/may.html#compute-now-uses-ebs-gp3-volumes-for-autoscaling-local-storage)\n+ [Unified Login now supported with AWS PrivateLink (Private Preview)](https://docs.databricks.com/release-notes/product/2024/may.html#unified-login-now-supported-with-aws-privatelink-private-preview)\n+ [Foundation Model Training (Public Preview)](https://docs.databricks.com/release-notes/product/2024/may.html#foundation-model-training-public-preview)\n+ [Attribute tag values for Unity Catalog objects can now be 1000 characters long (Public Preview)](https://docs.databricks.com/release-notes/product/2024/may.html#attribute-tag-values-for-unity-catalog-objects-can-now-be-1000-characters-long-public-preview)\n+ [New Previews page](https://docs.databricks.com/release-notes/product/2024/may.html#new-previews-page)\n+ [New capabilities for Databricks Vector Search](https://docs.databricks.com/release-notes/product/2024/may.html#new-capabilities-for-databricks-vector-search)\n+ [Credential passthrough and Hive metastore table access controls are deprecated](https://docs.databricks.com/release-notes/product/2024/may.html#credential-passthrough-and-hive-metastore-table-access-controls-are-deprecated)\n+ [Databricks JDBC driver 2.6.38](https://docs.databricks.com/release-notes/product/2024/may.html#databricks-jdbc-driver-2638)\n+ [Databricks Runtime 15.2 (Beta)](https://docs.databricks.com/release-notes/product/2024/may.html#databricks-runtime-152-beta)\n+ [Notebooks now detect and auto-complete column names for Spark Connect DataFrames](https://docs.databricks.com/release-notes/product/2024/may.html#notebooks-now-detect-and-auto-complete-column-names-for-spark-connect-dataframes)\n* [April 2024](https://docs.databricks.com/release-notes/product/2024/april.html)\n+ [Databricks Runtime 15.1 is GA](https://docs.databricks.com/release-notes/product/2024/april.html#databricks-runtime-151-is-ga)\n+ [Databricks Assistant: Threads & history](https://docs.databricks.com/release-notes/product/2024/april.html#databricks-assistant-threads--history)\n+ [Cancel pending serving endpoint updates in Model Serving](https://docs.databricks.com/release-notes/product/2024/april.html#cancel-pending-serving-endpoint-updates-in-model-serving)\n+ [Data lineage now captures reads on tables with column masks and row-level security](https://docs.databricks.com/release-notes/product/2024/april.html#data-lineage-now-captures-reads-on-tables-with-column-masks-and-row-level-security)\n+ [Meta Llama 3 is supported in Model Serving for AWS](https://docs.databricks.com/release-notes/product/2024/april.html#meta-llama-3-is-supported-in-model-serving-for-aws)\n+ [Notebooks now automatically detect SQL](https://docs.databricks.com/release-notes/product/2024/april.html#notebooks-now-automatically-detect-sql)\n+ [New columns added to the billable usage system table (Public Preview)](https://docs.databricks.com/release-notes/product/2024/april.html#new-columns-added-to-the-billable-usage-system-table-public-preview)\n+ [Delta Sharing supports tables that use column mapping (Public Preview)](https://docs.databricks.com/release-notes/product/2024/april.html#delta-sharing-supports-tables-that-use-column-mapping-public-preview)\n+ [Get serving endpoint schemas (Public Preview)](https://docs.databricks.com/release-notes/product/2024/april.html#get-serving-endpoint-schemas-public-preview)\n+ [Creation and installation of workspace libraries is no longer available](https://docs.databricks.com/release-notes/product/2024/april.html#creation-and-installation-of-workspace-libraries-is-no-longer-available)\n+ [Jobs created through the UI are now queued by default](https://docs.databricks.com/release-notes/product/2024/april.html#jobs-created-through-the-ui-are-now-queued-by-default)\n+ [Configuring access to resources from serving endpoints is GA](https://docs.databricks.com/release-notes/product/2024/april.html#configuring-access-to-resources-from-serving-endpoints-is-ga)\n+ [Serverless compute for workflows is in public preview](https://docs.databricks.com/release-notes/product/2024/april.html#serverless-compute-for-workflows-is-in-public-preview)\n+ [Lakehouse Federation supports foreign tables with case-sensitive identifiers](https://docs.databricks.com/release-notes/product/2024/april.html#lakehouse-federation-supports-foreign-tables-with-case-sensitive-identifiers)\n+ [Compute cloning now clones any libraries installed on the original compute](https://docs.databricks.com/release-notes/product/2024/april.html#compute-cloning-now-clones-any-libraries-installed-on-the-original-compute)\n+ [Route optimization is available for serving endpoints](https://docs.databricks.com/release-notes/product/2024/april.html#route-optimization-is-available-for-serving-endpoints)\n+ [Delta Live Tables notebook developer experience improvements (Public Preview)](https://docs.databricks.com/release-notes/product/2024/april.html#delta-live-tables-notebook-developer-experience-improvements-public-preview)\n+ [Databricks on AWS GovCloud (Public Preview)](https://docs.databricks.com/release-notes/product/2024/april.html#databricks-on-aws-govcloud-public-preview)\n* [March 2024](https://docs.databricks.com/release-notes/product/2024/march.html)\n+ [DBRX Base and DBRX Instruct are now available in Model Serving](https://docs.databricks.com/release-notes/product/2024/march.html#dbrx-base-and-dbrx-instruct-are-now-available-in-model-serving)\n+ [Model Serving is HIPAA compliant in all regions](https://docs.databricks.com/release-notes/product/2024/march.html#model-serving-is-hipaa-compliant-in-all-regions)\n+ [Provisioned throughput in Foundation Model APIs is GA and HIPAA compliant](https://docs.databricks.com/release-notes/product/2024/march.html#provisioned-throughput-in-foundation-model-apis-is-ga-and-hipaa-compliant)\n+ [MLflow now enforces quota limits for experiments and runs](https://docs.databricks.com/release-notes/product/2024/march.html#mlflow-now-enforces-quota-limits-for-experiments-and-runs)\n+ [The Jobs UI is updated to better manage jobs deployed by Databricks Asset Bundles](https://docs.databricks.com/release-notes/product/2024/march.html#the-jobs-ui-is-updated-to-better-manage-jobs-deployed-by-databricks-asset-bundles)\n+ [Google Cloud Vertex AI supported as model provider for external models](https://docs.databricks.com/release-notes/product/2024/march.html#google-cloud-vertex-ai-supported-as-model-provider-for-external-models)\n+ [Access resources from serving endpoints using instance profiles is GA](https://docs.databricks.com/release-notes/product/2024/march.html#access-resources-from-serving-endpoints-using-instance-profiles-is-ga)\n+ [Interactive notebook debugging](https://docs.databricks.com/release-notes/product/2024/march.html#interactive-notebook-debugging)\n+ [Self-service sign-up for private exchange providers in Marketplace](https://docs.databricks.com/release-notes/product/2024/march.html#self-service-sign-up-for-private-exchange-providers-in-marketplace)\n+ [Databricks Runtime 15.0 is GA](https://docs.databricks.com/release-notes/product/2024/march.html#databricks-runtime-150-is-ga)\n+ [Databricks Repos changed to Git folders](https://docs.databricks.com/release-notes/product/2024/march.html#databricks-repos-changed-to-git-folders)\n+ [Databricks Runtime 14.1 and 14.2 series support extended](https://docs.databricks.com/release-notes/product/2024/march.html#databricks-runtime-141-and-142-series-support-extended)\n+ [Databricks ODBC driver 2.8.0](https://docs.databricks.com/release-notes/product/2024/march.html#databricks-odbc-driver-280)\n+ [SQL warehouses for notebooks is GA](https://docs.databricks.com/release-notes/product/2024/march.html#sql-warehouses-for-notebooks-is-ga)\n+ [Delegate the ability to view an object\u2019s metadata in Unity Catalog (Public Preview)](https://docs.databricks.com/release-notes/product/2024/march.html#delegate-the-ability-to-view-an-objects-metadata-in-unity-catalog-public-preview)\n+ [Databricks Runtime 15.0 (Beta)](https://docs.databricks.com/release-notes/product/2024/march.html#databricks-runtime-150-beta)\n+ [Databricks Runtime 14.0 series support ends](https://docs.databricks.com/release-notes/product/2024/march.html#databricks-runtime-140-series-support-ends)\n+ [New computation for sys.path and CWD in Repos](https://docs.databricks.com/release-notes/product/2024/march.html#new-computation-for-syspath-and-cwd-in-repos)\n+ [Feature Serving is GA](https://docs.databricks.com/release-notes/product/2024/march.html#feature-serving-is-ga)\n+ [Predictive optimization available in more regions](https://docs.databricks.com/release-notes/product/2024/march.html#predictive-optimization-available-in-more-regions)\n* [February 2024](https://docs.databricks.com/release-notes/product/2024/february.html)\n+ [Use Delta Live Tables in Feature Engineering (Public Preview)](https://docs.databricks.com/release-notes/product/2024/february.html#use-delta-live-tables-in-feature-engineering-public-preview)\n+ [Restrict creating a personal access token for a service principal](https://docs.databricks.com/release-notes/product/2024/february.html#restrict-creating-a-personal-access-token-for-a-service-principal)\n+ [Restrict changing a job owner and the run as setting](https://docs.databricks.com/release-notes/product/2024/february.html#restrict-changing-a-job-owner-and-the-run-as-setting)\n+ [Automatic cluster update is enabled if the compliance security profile is enabled (GA)](https://docs.databricks.com/release-notes/product/2024/february.html#automatic-cluster-update-is-enabled-if-the-compliance-security-profile-is-enabled-ga-for-aws-cluster-update-is-changed-not-new-but-is-ga-now)\n+ [Account admins can enable enhanced security and compliance features (Public Preview)](https://docs.databricks.com/release-notes/product/2024/february.html#account-admins-can-enable-enhanced-security-and-compliance-features-public-preview)\n+ [Support for Cloudflare R2 storage to avoid cross-region egress fees (Public Preview)](https://docs.databricks.com/release-notes/product/2024/february.html#support-for-cloudflare-r2-storage-to-avoid-cross-region-egress-fees-public-preview)\n+ [Notebooks for monitoring and managing Delta Sharing egress costs are now available](https://docs.databricks.com/release-notes/product/2024/february.html#notebooks-for-monitoring-and-managing-delta-sharing-egress-costs-are-now-available)\n+ [Add data UI supports XML file format](https://docs.databricks.com/release-notes/product/2024/february.html#add-data-ui-supports-xml-file-format)\n+ [Support for cloud storage firewall from serverless compute (Public Preview)](https://docs.databricks.com/release-notes/product/2024/february.html#support-for-cloud-storage-firewall-from-serverless-compute-public-preview)\n+ [Use AI Functions to invoke a generative AI model from Foundation Model APIs](https://docs.databricks.com/release-notes/product/2024/february.html#use-ai-functions-to-invoke-a-generative-ai-model-from-foundation-model-apis)\n+ [Unity Catalog volumes are GA](https://docs.databricks.com/release-notes/product/2024/february.html#unity-catalog-volumes-are-ga)\n+ [Full-page AI-powered search](https://docs.databricks.com/release-notes/product/2024/february.html#full-page-ai-powered-search)\n+ [Run SQL notebook jobs on a SQL warehouse](https://docs.databricks.com/release-notes/product/2024/february.html#run-sql-notebook-jobs-on-a-sql-warehouse)\n+ [File arrival triggers in Databricks Workflows is GA](https://docs.databricks.com/release-notes/product/2024/february.html#file-arrival-triggers-in-databricks-workflows-is-ga)\n+ [Search for machine learning models in Unity Catalog using global workspace search](https://docs.databricks.com/release-notes/product/2024/february.html#search-for-machine-learning-models-in-unity-catalog-using-global-workspace-search)\n+ [Databricks Git server proxy is GA](https://docs.databricks.com/release-notes/product/2024/february.html#databricks-git-server-proxy-is-ga)\n+ [Databricks Git server proxy no longer requires CAN\\_ATTACH\\_TO permissions](https://docs.databricks.com/release-notes/product/2024/february.html#databricks-git-server-proxy-no-longer-requires-can_attach_to-permissions)\n+ [Workspace file support for the dbt and SQL file tasks is GA](https://docs.databricks.com/release-notes/product/2024/february.html#workspace-file-support-for-the-dbt-and-sql-file-tasks-is-ga)\n+ [Databricks Connect is GA for Scala](https://docs.databricks.com/release-notes/product/2024/february.html#databricks-connect-is-ga-for-scala)\n+ [Create tables from files in volumes](https://docs.databricks.com/release-notes/product/2024/february.html#create-tables-from-files-in-volumes)\n+ [Databricks Runtime 14.3 LTS is GA](https://docs.databricks.com/release-notes/product/2024/february.html#databricks-runtime-143-lts-is-ga)\n+ [Delta Sharing supports tables that use deletion vectors (Public Preview)](https://docs.databricks.com/release-notes/product/2024/february.html#delta-sharing-supports-tables-that-use-deletion-vectors-public-preview)\n* [January 2024](https://docs.databricks.com/release-notes/product/2024/january.html)\n+ [Native XML file format support (Public Preview)](https://docs.databricks.com/release-notes/product/2024/january.html#native-xml-file-format-support-public-preview)\n+ [Share AI models using Databricks Marketplace (Public Preview)](https://docs.databricks.com/release-notes/product/2024/january.html#share-ai-models-using-databricks-marketplace-public-preview)\n+ [Workspace path update](https://docs.databricks.com/release-notes/product/2024/january.html#workspace-path-update)\n+ [Streamlined creation of Databricks jobs](https://docs.databricks.com/release-notes/product/2024/january.html#streamlined-creation-of-databricks-jobs)\n+ [Monitor GPU model serving workloads using inference tables](https://docs.databricks.com/release-notes/product/2024/january.html#monitor-gpu-model-serving-workloads-using-inference-tables)\n+ [URI path-based access to Unity Catalog external volumes](https://docs.databricks.com/release-notes/product/2024/january.html#uri-path-based-access-to-unity-catalog-external-volumes)\n+ [Access controls lists can be enabled on upgraded workspaces](https://docs.databricks.com/release-notes/product/2024/january.html#access-controls-lists-can-be-enabled-on-upgraded-workspaces)\n+ [Marketplace listing events system table now available (Public Preview)](https://docs.databricks.com/release-notes/product/2024/january.html#marketplace-listing-events-system-table-now-available-public-preview)\n+ [Updated UI for notebook cells (Public Preview)](https://docs.databricks.com/release-notes/product/2024/january.html#updated-ui-for-notebook-cells-public-preview)\n+ [**Quick Fix** help with syntax errors in the notebook](https://docs.databricks.com/release-notes/product/2024/january.html#quick-fix-help-with-syntax-errors-in-the-notebook)\n+ [Databricks Runtime 14.3 LTS (Beta)](https://docs.databricks.com/release-notes/product/2024/january.html#databricks-runtime-143-lts-beta)\n+ [Share AI models using Delta Sharing (Public Preview)](https://docs.databricks.com/release-notes/product/2024/january.html#share-ai-models-using-delta-sharing-public-preview)\n+ [Databricks Marketplace supports volume sharing](https://docs.databricks.com/release-notes/product/2024/january.html#databricks-marketplace-supports-volume-sharing)\n+ [Create widgets from the Databricks UI](https://docs.databricks.com/release-notes/product/2024/january.html#create-widgets-from-the-databricks-ui)\n+ [Warehouse events system table is now available (Public Preview)](https://docs.databricks.com/release-notes/product/2024/january.html#warehouse-events-system-table-is-now-available-public-preview)\n+ [UI experience for OAuth app registration](https://docs.databricks.com/release-notes/product/2024/january.html#ui-experience-for-oauth-app-registration)\n+ [Reuse subnets across workspaces for customer-managed VPCs](https://docs.databricks.com/release-notes/product/2024/january.html#reuse-subnets-across-workspaces-for-customer-managed-vpcs)\n+ [Workspace file size limit is now 500MB](https://docs.databricks.com/release-notes/product/2024/january.html#workspace-file-size-limit-is-now-500mb)\n+ [Feature removal notice for legacy Git integration in Databricks](https://docs.databricks.com/release-notes/product/2024/january.html#feature-removal-notice-for-legacy-git-integration-in-databricks)\n+ [Databricks ODBC driver 2.7.7](https://docs.databricks.com/release-notes/product/2024/january.html#databricks-odbc-driver-277)\n+ [Databricks Runtime 13.2 series support ends](https://docs.databricks.com/release-notes/product/2024/january.html#databricks-runtime-132-series-support-ends)\n* [December 2023](https://docs.databricks.com/release-notes/product/2023/december.html)\n+ [Share dynamic views using Delta Sharing (Public Preview)](https://docs.databricks.com/release-notes/product/2023/december.html#share-dynamic-views-using-delta-sharing-public-preview)\n+ [Share volumes using Delta Sharing (Public Preview)](https://docs.databricks.com/release-notes/product/2023/december.html#share-volumes-using-delta-sharing-public-preview)\n+ [Entity Relationship Diagram for primary keys and foreign keys](https://docs.databricks.com/release-notes/product/2023/december.html#entity-relationship-diagram-for-primary-keys-and-foreign-keys)\n+ [Unity Catalog volume file upload size limit increase](https://docs.databricks.com/release-notes/product/2023/december.html#unity-catalog-volume-file-upload-size-limit-increase)\n+ [New notebook cell results rendering available (Public Preview)](https://docs.databricks.com/release-notes/product/2023/december.html#new-notebook-cell-results-rendering-available-public-preview)\n+ [Notebook editor themes available](https://docs.databricks.com/release-notes/product/2023/december.html#notebook-editor-themes-available)\n+ [External models support in Model Serving is in Public Preview](https://docs.databricks.com/release-notes/product/2023/december.html#external-models-support-in-model-serving-is-in-public-preview)\n+ [Databricks Online Tables is Public Preview](https://docs.databricks.com/release-notes/product/2023/december.html#databricks-online-tables-is-public-preview)\n+ [Repos & Git Integration Settings UI now correctly notes support for GitHub Enterprise Server](https://docs.databricks.com/release-notes/product/2023/december.html#repos--git-integration-settings-ui-now-correctly-notes-support-for-github-enterprise-server)\n+ [Databricks JDBC driver 2.6.36](https://docs.databricks.com/release-notes/product/2023/december.html#databricks-jdbc-driver-2636)\n+ [Support for referencing workspace files from init scripts](https://docs.databricks.com/release-notes/product/2023/december.html#support-for-referencing-workspace-files-from-init-scripts)\n+ [Feature & Function Serving is Public Preview](https://docs.databricks.com/release-notes/product/2023/december.html#feature--function-serving-is-public-preview)\n+ [Foundation Model APIs is Public Preview](https://docs.databricks.com/release-notes/product/2023/december.html#foundation-model-apis-is-public-preview)\n+ [New unified admin settings UI](https://docs.databricks.com/release-notes/product/2023/december.html#new-unified-admin-settings-ui)\n+ [Init scripts on DBFS are end-of-life](https://docs.databricks.com/release-notes/product/2023/december.html#init-scripts-on-dbfs-are-end-of-life)\n+ [Legacy global and cluster-named init scripts are end-of-life](https://docs.databricks.com/release-notes/product/2023/december.html#legacy-global-and-cluster-named-init-scripts-are-end-of-life)\n* [November 2023](https://docs.databricks.com/release-notes/product/2023/november.html)\n+ [Databricks Vector Search is Public Preview](https://docs.databricks.com/release-notes/product/2023/november.html#databricks-vector-search-is-public-preview)\n+ [IAM policies for storage credentials now require an external ID](https://docs.databricks.com/release-notes/product/2023/november.html#iam-policies-for-storage-credentials-now-require-an-external-id)\n+ [Access controls lists can no longer be disabled](https://docs.databricks.com/release-notes/product/2023/november.html#access-controls-lists-can-no-longer-be-disabled)\n+ [AI assistive features are enabled by default](https://docs.databricks.com/release-notes/product/2023/november.html#ai-assistive-features-are-enabled-by-default)\n+ [New behaviors and actions in Catalog Explorer for volumes](https://docs.databricks.com/release-notes/product/2023/november.html#new-behaviors-and-actions-in-catalog-explorer-for-volumes)\n+ [Databricks Runtime 14.2 is GA](https://docs.databricks.com/release-notes/product/2023/november.html#databricks-runtime-142-is-ga)\n+ [Databricks SQL Connector for Python version 3.0.0](https://docs.databricks.com/release-notes/product/2023/november.html#databricks-sql-connector-for-python-version-300)\n+ [Libraries in workspace files supported on no-isolation shared clusters](https://docs.databricks.com/release-notes/product/2023/november.html#libraries-in-workspace-files-supported-on-no-isolation-shared-clusters)\n+ [Deprecation of workspace libraries](https://docs.databricks.com/release-notes/product/2023/november.html#deprecation-of-workspace-libraries)\n+ [Delegate the ability to create a storage credential in Unity Catalog](https://docs.databricks.com/release-notes/product/2023/november.html#delegate-the-ability-to-create-a-storage-credential-in-unity-catalog)\n+ [Search for Databricks Marketplace listings using global workspace search](https://docs.databricks.com/release-notes/product/2023/november.html#search-for-databricks-marketplace-listings-using-global-workspace-search)\n+ [Consume data products in Databricks Marketplace using external platforms](https://docs.databricks.com/release-notes/product/2023/november.html#consume-data-products-in-databricks-marketplace-using-external-platforms)\n+ [Automatic enablement of Unity Catalog for new workspaces](https://docs.databricks.com/release-notes/product/2023/november.html#automatic-enablement-of-unity-catalog-for-new-workspaces)\n+ [Authentication using OAuth is GA](https://docs.databricks.com/release-notes/product/2023/november.html#authentication-using-oauth-is-ga)\n+ [Databricks Runtime 14.2 (beta)](https://docs.databricks.com/release-notes/product/2023/november.html#databricks-runtime-142-beta)\n+ [Databricks Marketplace includes Databricks Solution Accelerators](https://docs.databricks.com/release-notes/product/2023/november.html#databricks-marketplace-includes-databricks-solution-accelerators)\n+ [Lakehouse Federation adds support for Google BigQuery](https://docs.databricks.com/release-notes/product/2023/november.html#lakehouse-federation-adds-support-for-google-bigquery)\n* [October 2023](https://docs.databricks.com/release-notes/product/2023/october.html)\n+ [Published partner OAuth applications are easier to use by default](https://docs.databricks.com/release-notes/product/2023/october.html#published-partner-oauth-applications-are-easier-to-use-by-default)\n+ [View the YAML source for a Databricks job](https://docs.databricks.com/release-notes/product/2023/october.html#view-the-yaml-source-for-a-databricks-job)\n+ [Add conditional logic to your Databricks workflows](https://docs.databricks.com/release-notes/product/2023/october.html#add-conditional-logic-to-your-databricks-workflows)\n+ [Configure parameters on a Databricks job that can be referenced by all job tasks](https://docs.databricks.com/release-notes/product/2023/october.html#configure-parameters-on-a-databricks-job-that-can-be-referenced-by-all-job-tasks)\n+ [Support for new GPU instance types](https://docs.databricks.com/release-notes/product/2023/october.html#support-for-new-gpu-instance-types)\n+ [Auto-enable deletion vectors](https://docs.databricks.com/release-notes/product/2023/october.html#auto-enable-deletion-vectors)\n+ [Unity Catalog support for `UNDROP TABLE` is GA](https://docs.databricks.com/release-notes/product/2023/october.html#unity-catalog-support-for-undrop-table-is-ga)\n+ [Partner Connect supports Dataiku](https://docs.databricks.com/release-notes/product/2023/october.html#partner-connect-supports-dataiku)\n+ [Databricks AutoML generated notebooks are now saved as MLflow artifacts](https://docs.databricks.com/release-notes/product/2023/october.html#databricks-automl-generated-notebooks-are-now-saved-as-mlflow-artifacts)\n+ [Predictive optimization (Public Preview)](https://docs.databricks.com/release-notes/product/2023/october.html#predictive-optimization-public-preview)\n+ [Compute system tables are now available (Public Preview)](https://docs.databricks.com/release-notes/product/2023/october.html#compute-system-tables-are-now-available-public-preview)\n+ [On-demand feature computation is GA](https://docs.databricks.com/release-notes/product/2023/october.html#on-demand-feature-computation-is-ga)\n+ [Feature Engineering in Unity Catalog is GA](https://docs.databricks.com/release-notes/product/2023/october.html#feature-engineering-in-unity-catalog-is-ga)\n+ [AI-generated table comments (Public Preview)](https://docs.databricks.com/release-notes/product/2023/october.html#ai-generated-table-comments-public-preview)\n+ [Compliance security profile works with serverless SQL warehouses in `ap-southeast-2` region (Public Preview)](https://docs.databricks.com/release-notes/product/2023/october.html#compliance-security-profile-works-with-serverless-sql-warehouses-in-ap-southeast-2-region-public-preview)\n+ [Models in Unity Catalog is GA](https://docs.databricks.com/release-notes/product/2023/october.html#models-in-unity-catalog-is-ga)\n+ [Libraries now supported in compute policies (Public Preview)](https://docs.databricks.com/release-notes/product/2023/october.html#libraries-now-supported-in-compute-policies-public-preview)\n+ [Partner Connect supports Monte Carlo](https://docs.databricks.com/release-notes/product/2023/october.html#partner-connect-supports-monte-carlo)\n+ [Semantic search (Public Preview)](https://docs.databricks.com/release-notes/product/2023/october.html#semantic-search-public-preview)\n+ [Enable Databricks Assistant at the workspace level](https://docs.databricks.com/release-notes/product/2023/october.html#enable-databricks-assistant-at-the-workspace-level)\n+ [IP access lists for the account console is GA](https://docs.databricks.com/release-notes/product/2023/october.html#ip-access-lists-for-the-account-console-is-ga)\n+ [New Photon defaults](https://docs.databricks.com/release-notes/product/2023/october.html#new-photon-defaults)\n+ [Databricks Runtime 14.1 is GA](https://docs.databricks.com/release-notes/product/2023/october.html#databricks-runtime-141-is-ga)\n+ [Developer tools release notes have moved](https://docs.databricks.com/release-notes/product/2023/october.html#developer-tools-release-notes-have-moved)\n+ [Databricks extension for Visual Studio Code updated to version 1.1.5](https://docs.databricks.com/release-notes/product/2023/october.html#databricks-extension-for-visual-studio-code-updated-to-version-115)\n+ [Predictive I/O for updates is GA](https://docs.databricks.com/release-notes/product/2023/october.html#predictive-io-for-updates-is-ga)\n+ [Deletion vectors are GA](https://docs.databricks.com/release-notes/product/2023/october.html#deletion-vectors-are-ga)\n+ [Automatic enablement of Unity Catalog for new workspaces](https://docs.databricks.com/release-notes/product/2023/october.html#automatic-enablement-of-unity-catalog-for-new-workspaces)\n+ [Infosec Registered Assessors Program (IRAP) compliance controls](https://docs.databricks.com/release-notes/product/2023/october.html#infosec-registered-assessors-program-irap-compliance-controls)\n+ [Partner Connect supports RudderStack](https://docs.databricks.com/release-notes/product/2023/october.html#partner-connect-supports-rudderstack)\n+ [Databricks CLI updated to version 0.207.0 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/october.html#databricks-cli-updated-to-version-02070-public-preview)\n+ [Run selected cells in a notebook](https://docs.databricks.com/release-notes/product/2023/october.html#run-selected-cells-in-a-notebook)\n+ [Use workspace-catalog binding to give read-only access to a catalog](https://docs.databricks.com/release-notes/product/2023/october.html#use-workspace-catalog-binding-to-give-read-only-access-to-a-catalog)\n+ [New in-product Help experience (Public Preview)](https://docs.databricks.com/release-notes/product/2023/october.html#new-in-product-help-experience-public-preview)\n+ [Databricks extension for Visual Studio Code updated to version 1.1.4](https://docs.databricks.com/release-notes/product/2023/october.html#databricks-extension-for-visual-studio-code-updated-to-version-114)\n+ [Databricks SDK for Python updated to version 0.10.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/october.html#databricks-sdk-for-python-updated-to-version-0100-beta)\n+ [Databricks SDK for Go updated to version 0.22.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/october.html#databricks-sdk-for-go-updated-to-version-0220-beta)\n* [September 2023](https://docs.databricks.com/release-notes/product/2023/september.html)\n+ [Lakehouse Federation available on single-user clusters](https://docs.databricks.com/release-notes/product/2023/september.html#lakehouse-federation-available-on-single-user-clusters)\n+ [Databricks Runtime 14.1 (Beta)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-runtime-141-beta)\n+ [GPU model serving optimized for LLMs in Public Preview](https://docs.databricks.com/release-notes/product/2023/september.html#gpu-model-serving-optimized-for-llms-in-public-preview)\n+ [Prevent runs of a job from being skipped when you reach concurrency limits](https://docs.databricks.com/release-notes/product/2023/september.html#prevent-runs-of-a-job-from-being-skipped-when-you-reach-concurrency-limits)\n+ [Databricks Terraform provider updated to version 1.27.0](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-terraform-provider-updated-to-version-1270)\n+ [Databricks CLI updated to version 0.206.0 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-cli-updated-to-version-02060-public-preview)\n+ [Databricks SDK for Go updated to version 0.21.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-sdk-for-go-updated-to-version-0210-beta)\n+ [Inference tables for model serving endpoints is Public Preview](https://docs.databricks.com/release-notes/product/2023/september.html#inference-tables-for-model-serving-endpoints-is-public-preview)\n+ [Databricks ODBC driver 2.7.5](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-odbc-driver-275)\n+ [Databricks extension for Visual Studio Code updated to version 1.1.3](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-extension-for-visual-studio-code-updated-to-version-113)\n+ [Running jobs as a service principal is GA](https://docs.databricks.com/release-notes/product/2023/september.html#running-jobs-as-a-service-principal-is-ga)\n+ [Databricks CLI updated to version 0.205.2 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-cli-updated-to-version-02052-public-preview)\n+ [Databricks CLI updated to version 0.205.1 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-cli-updated-to-version-02051-public-preview)\n+ [Databricks SDK for Go updated to version 0.20.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-sdk-for-go-updated-to-version-0200-beta)\n+ [Databricks SDK for Python updated to version 0.9.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-sdk-for-python-updated-to-version-090-beta)\n+ [Databricks SDK for Python updated to version 0.7.1 (Beta)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-sdk-for-python-updated-to-version-071-beta)\n+ [Databricks Terraform provider updated to version 1.26.0](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-terraform-provider-updated-to-version-1260)\n+ [Databricks Connect for Databricks Runtime 14.0](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-connect-for-databricks-runtime-140)\n+ [Configure Tableau and PowerBI OAuth with SAML SSO](https://docs.databricks.com/release-notes/product/2023/september.html#configure-tableau-and-powerbi-oauth-with-saml-sso)\n+ [Lakeview dashboards in Public Preview](https://docs.databricks.com/release-notes/product/2023/september.html#lakeview-dashboards-in-public-preview)\n+ [Fleet clusters now support Graviton instances](https://docs.databricks.com/release-notes/product/2023/september.html#fleet-clusters-now-support-graviton-instances)\n+ [Databricks Connect V2 is Public Preview for Scala](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-connect-v2-is-public-preview-for-scala)\n+ [GPU model serving in Public Preview](https://docs.databricks.com/release-notes/product/2023/september.html#gpu-model-serving-in-public-preview)\n+ [On-demand feature computation now available in Unity Catalog (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#on-demand-feature-computation-now-available-in-unity-catalog-public-preview)\n+ [Unified login for accounts created before June 21, 2023 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#unified-login-for-accounts-created-before-june-21-2023-public-preview)\n+ [Databricks Runtime 14.0 is GA](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-runtime-140-is-ga)\n+ [Account email notifications upgrade](https://docs.databricks.com/release-notes/product/2023/september.html#account-email-notifications-upgrade)\n+ [Databricks Terraform provider updated to version 1.25.1](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-terraform-provider-updated-to-version-1251)\n+ [Databricks SDK for Go updated to version 0.19.2 (Beta)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-sdk-for-go-updated-to-version-0192-beta)\n+ [Databricks CLI updated to version 0.205.0 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-cli-updated-to-version-02050-public-preview)\n+ [Reserved location for `/Workspace`](https://docs.databricks.com/release-notes/product/2023/september.html#reserved-location-for-workspace)\n+ [Databricks Asset Bundles (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-asset-bundles-public-preview)\n+ [Databricks Terraform provider updated to version 1.25.0](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-terraform-provider-updated-to-version-1250)\n+ [Databricks extension for Visual Studio Code updated to version 1.1.2](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-extension-for-visual-studio-code-updated-to-version-112)\n+ [Partner Connect supports Snowplow](https://docs.databricks.com/release-notes/product/2023/september.html#partner-connect-supports-snowplow)\n+ [Databricks CLI updated to version 0.204.1 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-cli-updated-to-version-02041-public-preview)\n+ [Filter sensitive data with row filters and column masks (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#filter-sensitive-data-with-row-filters-and-column-masks-public-preview)\n+ [System tables now include Marketplace schema (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#system-tables-now-include-marketplace-schema-public-preview)\n+ [Pricing system table is now available (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#pricing-system-table-is-now-available-public-preview)\n+ [Databricks SDK for Go updated to version 0.19.1 (Beta)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-sdk-for-go-updated-to-version-0191-beta)\n+ [Databricks ODBC driver 2.7.3](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-odbc-driver-273)\n+ [Databricks CLI updated to version 0.204.0 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-cli-updated-to-version-02040-public-preview)\n+ [Data Explorer is now Catalog Explorer](https://docs.databricks.com/release-notes/product/2023/september.html#data-explorer-is-now-catalog-explorer)\n+ [Databricks SDK for Go updated to version 0.19.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-sdk-for-go-updated-to-version-0190-beta)\n+ [Databricks SDK for Python updated to version 0.8.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/september.html#databricks-sdk-for-python-updated-to-version-080-beta)\n+ [Delegate allowlist privileges in Unity Catalog](https://docs.databricks.com/release-notes/product/2023/september.html#delegate-allowlist-privileges-in-unity-catalog)\n+ [GitHub apps integration in Repos is GA](https://docs.databricks.com/release-notes/product/2023/september.html#github-apps-integration-in-repos-is-ga)\n* [August 2023](https://docs.databricks.com/release-notes/product/2023/august.html)\n+ [Tables now appear in navigational search](https://docs.databricks.com/release-notes/product/2023/august.html#tables-now-appear-in-navigational-search)\n+ [Databricks CLI updated to version 0.203.3 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-cli-updated-to-version-02033-public-preview)\n+ [Databricks JDBC driver 2.6.34](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-jdbc-driver-2634)\n+ [Databricks SDK for Go updated to version 0.18.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-sdk-for-go-updated-to-version-0180-beta)\n+ [Databricks SDK for Python updated to version 0.7.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-sdk-for-python-updated-to-version-070-beta)\n+ [Databricks Terraform provider updated to version 1.24.1](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-terraform-provider-updated-to-version-1241)\n+ [Init scripts on DBFS end of life extended to Dec 1, 2023](https://docs.databricks.com/release-notes/product/2023/august.html#init-scripts-on-dbfs-end-of-life-extended-to-dec-1-2023)\n+ [Databricks Runtime 14.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-runtime-140-beta)\n+ [Custom workspace tags](https://docs.databricks.com/release-notes/product/2023/august.html#custom-workspace-tags)\n+ [Unified navigation experience is GA](https://docs.databricks.com/release-notes/product/2023/august.html#unified-navigation-experience-is-ga)\n+ [Allow additional ports from your classic compute plane to control plane by January 31, 2024](https://docs.databricks.com/release-notes/product/2023/august.html#allow-additional-ports-from-your-classic-compute-plane-to-control-plane-by-january-31-2024)\n+ [Databricks Terraform provider updated to version 1.24.0](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-terraform-provider-updated-to-version-1240)\n+ [Shared access mode clusters can now connect to public networks outside of Databricks VPC](https://docs.databricks.com/release-notes/product/2023/august.html#shared-access-mode-clusters-can-now-connect-to-public-networks-outside-of-databricks-vpc)\n+ [Databricks Runtime for Genomics setting removed from the workspace admin settings page](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-runtime-for-genomics-setting-removed-from-the-workspace-admin-settings-page)\n+ [Container Services setting removed from the workspace admin settings page](https://docs.databricks.com/release-notes/product/2023/august.html#container-services-setting-removed-from-the-workspace-admin-settings-page)\n+ [Databricks CLI updated to version 0.203.2 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-cli-updated-to-version-02032-public-preview)\n+ [Go to definition for functions and variables in Python notebooks](https://docs.databricks.com/release-notes/product/2023/august.html#go-to-definition-for-functions-and-variables-in-python-notebooks)\n+ [Unified schema browser is GA](https://docs.databricks.com/release-notes/product/2023/august.html#unified-schema-browser-is-ga)\n+ [Databricks Runtime 13.3 LTS is GA](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-runtime-133-lts-is-ga)\n+ [Introducing tags with Unity Catalog](https://docs.databricks.com/release-notes/product/2023/august.html#introducing-tags-with-unity-catalog)\n+ [Allowlist for init scripts, JARs, and Maven coordinates on Unity Catalog shared clusters is in Public Preview](https://docs.databricks.com/release-notes/product/2023/august.html#allowlist-for-init-scripts-jars-and-maven-coordinates-on-unity-catalog-shared-clusters-is-in-public-preview)\n+ [Volumes support for init scripts and JARs is in Public Preview](https://docs.databricks.com/release-notes/product/2023/august.html#volumes-support-for-init-scripts-and-jars-is-in-public-preview)\n+ [Easier Databricks Repos .ipynb file output commits](https://docs.databricks.com/release-notes/product/2023/august.html#easier-databricks-repos-ipynb-file-output-commits)\n+ [IPYNB notebook support in Databricks Repos is GA](https://docs.databricks.com/release-notes/product/2023/august.html#ipynb-notebook-support-in-databricks-repos-is-ga)\n+ [Databricks SDK for Go updated to version 0.17.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-sdk-for-go-updated-to-version-0170-beta)\n+ [Databricks SDK for Python updated to version 0.6.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-sdk-for-python-updated-to-version-060-beta)\n+ [Databricks CLI updated to version 0.203.1 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-cli-updated-to-version-02031-public-preview)\n+ [Unified Schema Browser is now GA](https://docs.databricks.com/release-notes/product/2023/august.html#unified-schema-browser-is-now-ga)\n+ [Databricks SDK for Go updated to version 0.16.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-sdk-for-go-updated-to-version-0160-beta)\n+ [Partner Connect supports Census](https://docs.databricks.com/release-notes/product/2023/august.html#partner-connect-supports-census)\n+ [Databricks SDK for Python updated to version 0.5.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-sdk-for-python-updated-to-version-050-beta)\n+ [Programmatic write support for workspace files](https://docs.databricks.com/release-notes/product/2023/august.html#programmatic-write-support-for-workspace-files)\n+ [Access resources from serving endpoints with instance profiles (Public Preview)](https://docs.databricks.com/release-notes/product/2023/august.html#access-resources-from-serving-endpoints-with-instance-profiles-public-preview)\n+ [Databricks CLI updated to version 0.203.0 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-cli-updated-to-version-02030-public-preview)\n+ [Databricks Terraform provider updated to version 1.23.0](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-terraform-provider-updated-to-version-1230)\n+ [Groups can now be renamed](https://docs.databricks.com/release-notes/product/2023/august.html#groups-can-now-be-renamed)\n+ [Databricks SDK for Go updated to version 0.15.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-sdk-for-go-updated-to-version-0150-beta)\n+ [Databricks SDK for Python updated to version 0.4.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-sdk-for-python-updated-to-version-040-beta)\n+ [Third-party iFraming prevention configuration setting was removed](https://docs.databricks.com/release-notes/product/2023/august.html#third-party-iframing-prevention-configuration-setting-was-removed)\n+ [Databricks extension for Visual Studio Code updated to version 1.1.1](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-extension-for-visual-studio-code-updated-to-version-111)\n+ [LangChain available in 13.1 and above](https://docs.databricks.com/release-notes/product/2023/august.html#langchain-available-in-131-and-above)\n+ [Feature Engineering in Unity Catalog is Public Preview](https://docs.databricks.com/release-notes/product/2023/august.html#feature-engineering-in-unity-catalog-is-public-preview)\n+ [Improved error handling for repeated continuous job failures](https://docs.databricks.com/release-notes/product/2023/august.html#improved-error-handling-for-repeated-continuous-job-failures)\n+ [Share schemas using Delta Sharing (Public Preview)](https://docs.databricks.com/release-notes/product/2023/august.html#share-schemas-using-delta-sharing-public-preview)\n+ [Run tasks conditionally in your Databricks jobs](https://docs.databricks.com/release-notes/product/2023/august.html#run-tasks-conditionally-in-your-databricks-jobs)\n+ [Compliance security profile works with serverless SQL warehouses in some regions (Public Preview)](https://docs.databricks.com/release-notes/product/2023/august.html#compliance-security-profile-works-with-serverless-sql-warehouses-in-some-regions-public-preview)\n+ [Databricks Terraform provider updated to version 1.22.0](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-terraform-provider-updated-to-version-1220)\n+ [Lakehouse Monitoring is Public Preview](https://docs.databricks.com/release-notes/product/2023/august.html#lakehouse-monitoring-is-public-preview)\n+ [Databricks Runtime 13.3 LTS (Beta)](https://docs.databricks.com/release-notes/product/2023/august.html#databricks-runtime-133-lts-beta)\n+ [New Git operations are generally available: Merge branches, rebase and pull with conflict resolution](https://docs.databricks.com/release-notes/product/2023/august.html#new-git-operations-are-generally-available-merge-branches-rebase-and-pull-with-conflict-resolution)\n* [July 2023](https://docs.databricks.com/release-notes/product/2023/july.html)\n+ [Email addresses in Databricks are now case insensitive](https://docs.databricks.com/release-notes/product/2023/july.html#email-addresses-in-databricks-are-now-case-insensitive)\n+ [Workspace admins can now create account groups](https://docs.databricks.com/release-notes/product/2023/july.html#workspace-admins-can-now-create-account-groups)\n+ [Group manager role is in Public Preview](https://docs.databricks.com/release-notes/product/2023/july.html#group-manager-role-is-in-public-preview)\n+ [Databricks CLI updated to version 0.202.0 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-cli-updated-to-version-02020-public-preview)\n+ [Databricks SDK for Python updated to version 0.3.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-sdk-for-python-updated-to-version-030-beta)\n+ [Databricks SDK for Go updated to version 0.14.1 (Beta)](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-sdk-for-go-updated-to-version-0141-beta)\n+ [Databricks SDK for Go updated to version 0.14.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-sdk-for-go-updated-to-version-0140-beta)\n+ [Run another job as a task in a Databricks job](https://docs.databricks.com/release-notes/product/2023/july.html#run-another-job-as-a-task-in-a-databricks-job)\n+ [All users can access data products in Databricks Marketplace by default](https://docs.databricks.com/release-notes/product/2023/july.html#all-users-can-access-data-products-in-databricks-marketplace-by-default)\n+ [Classic keyboard shortcuts mode](https://docs.databricks.com/release-notes/product/2023/july.html#classic-keyboard-shortcuts-mode)\n+ [Lakehouse Federation lets you run queries against external database providers (Public Preview)](https://docs.databricks.com/release-notes/product/2023/july.html#lakehouse-federation-lets-you-run-queries-against-external-database-providers-public-preview)\n+ [Move to trash enabled for Repos](https://docs.databricks.com/release-notes/product/2023/july.html#move-to-trash-enabled-for-repos)\n+ [Create alerts for slow-running or stuck jobs](https://docs.databricks.com/release-notes/product/2023/july.html#create-alerts-for-slow-running-or-stuck-jobs)\n+ [Databricks SDK for Go updated to version 0.13.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-sdk-for-go-updated-to-version-0130-beta)\n+ [Databricks SDK for Python updated to version 0.2.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-sdk-for-python-updated-to-version-020-beta)\n+ [Databricks CLI updated to version 0.201.0 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-cli-updated-to-version-02010-public-preview)\n+ [Databricks SDK for Python updated to version 0.2.1 (Beta)](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-sdk-for-python-updated-to-version-021-beta)\n+ [Databricks Assistant is in Public Preview](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-assistant-is-in-public-preview)\n+ [Deactivate users and service principals from your account](https://docs.databricks.com/release-notes/product/2023/july.html#deactivate-users-and-service-principals-from-your-account)\n+ [Account-level SCIM provisioning now deactivates users when they are deactivated in the identity provider](https://docs.databricks.com/release-notes/product/2023/july.html#account-level-scim-provisioning-now-deactivates-users-when-they-are-deactivated-in-the-identity-provider)\n+ [Trash directory admin access](https://docs.databricks.com/release-notes/product/2023/july.html#trash-directory-admin-access)\n+ [Prevention of MIME type sniffing and XSS attack page rendering are now always enabled](https://docs.databricks.com/release-notes/product/2023/july.html#prevention-of-mime-type-sniffing-and-xss-attack-page-rendering-are-now-always-enabled)\n+ [Unity Catalog volumes are in Public Preview](https://docs.databricks.com/release-notes/product/2023/july.html#unity-catalog-volumes-are-in-public-preview)\n+ [Simplified experience for submitting product feedback from the workspace](https://docs.databricks.com/release-notes/product/2023/july.html#simplified-experience-for-submitting-product-feedback-from-the-workspace)\n+ [Databricks extension for Visual Studio Code updated to version 1.1.0](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-extension-for-visual-studio-code-updated-to-version-110)\n+ [Functions now displayed in Catalog Explorer (Public Preview)](https://docs.databricks.com/release-notes/product/2023/july.html#functions-now-displayed-in-catalog-explorer-public-preview)\n+ [Databricks Terraform provider updated to version 1.21.0](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-terraform-provider-updated-to-version-1210)\n+ [The maximum offset for the `List all jobs` and `List job runs` API requests is now limited](https://docs.databricks.com/release-notes/product/2023/july.html#the-maximum-offset-for-the-list-all-jobs-and-list-job-runs-api-requests-is-now-limited)\n+ [Databricks Runtime 13.2 is GA](https://docs.databricks.com/release-notes/product/2023/july.html#databricks-runtime-132-is-ga)\n+ [Delta Sharing and Databricks Marketplace support view sharing (Public Preview)](https://docs.databricks.com/release-notes/product/2023/july.html#delta-sharing-and-databricks-marketplace-support-view-sharing-public-preview)\n+ [Init scripts on DBFS reach end of life on Sept 1, 2023](https://docs.databricks.com/release-notes/product/2023/july.html#init-scripts-on-dbfs-reach-end-of-life-on-sept-1-2023)\n* [June 2023](https://docs.databricks.com/release-notes/product/2023/june.html)\n+ [Databricks Terraform provider updated to version 1.20.0](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-terraform-provider-updated-to-version-1200)\n+ [Databricks CLI updated to version 0.200.1 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-cli-updated-to-version-02001-public-preview)\n+ [Databricks SDK for Go updated to version 0.12.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-sdk-for-go-updated-to-version-0120-beta)\n+ [Databricks SDK for Go updated to version 0.11.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-sdk-for-go-updated-to-version-0110-beta)\n+ [Databricks SDK for Python (Beta)](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-sdk-for-python-beta)\n+ [Databricks SDK for Go (Beta)](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-sdk-for-go-beta)\n+ [Access audit log, billable usage, and lineage system tables (Public Preview)](https://docs.databricks.com/release-notes/product/2023/june.html#access-audit-log-billable-usage-and-lineage-system-tables-public-preview)\n+ [Models in Unity Catalog (Public Preview)](https://docs.databricks.com/release-notes/product/2023/june.html#models-in-unity-catalog-public-preview)\n+ [Databricks Marketplace is now GA](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-marketplace-is-now-ga)\n+ [Databricks Runtime 13.2 (Beta)](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-runtime-132-beta)\n+ [Databricks extension for Visual Studio Code (General Availability)](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-extension-for-visual-studio-code-general-availability)\n+ [See a visual overview of completed job runs in the Databricks Jobs UI](https://docs.databricks.com/release-notes/product/2023/june.html#see-a-visual-overview-of-completed-job-runs-in-the-databricks-jobs-ui)\n+ [Databricks CLI (Public Preview)](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-cli-public-preview)\n+ [Unified Login GA for new accounts](https://docs.databricks.com/release-notes/product/2023/june.html#unified-login-ga-for-new-accounts)\n+ [Test single sign-on](https://docs.databricks.com/release-notes/product/2023/june.html#test-single-sign-on)\n+ [Improved pagination of results from `List all jobs` and `List job runs` API requests](https://docs.databricks.com/release-notes/product/2023/june.html#improved-pagination-of-results-from-list-all-jobs-and-list-job-runs-api-requests)\n+ [Full-page workspace browser includes Repos](https://docs.databricks.com/release-notes/product/2023/june.html#full-page-workspace-browser-includes-repos)\n+ [Databricks Terraform provider updated to version 1.19.0](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-terraform-provider-updated-to-version-1190)\n+ [Run jobs as a service principal (Public Preview)](https://docs.databricks.com/release-notes/product/2023/june.html#run-jobs-as-a-service-principal-public-preview)\n+ [New service principal UI provides better management experience](https://docs.databricks.com/release-notes/product/2023/june.html#new-service-principal-ui-provides-better-management-experience)\n+ [Home folders restored when users are re-added to workspaces](https://docs.databricks.com/release-notes/product/2023/june.html#home-folders-restored-when-users-are-re-added-to-workspaces)\n+ [Databricks Marketplace: Private exchanges are now available](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-marketplace-private-exchanges-are-now-available)\n+ [Databricks Marketplace: Consumers can uninstall data products using the UI](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-marketplace-consumers-can-uninstall-data-products-using-the-ui)\n+ [Databricks Marketplace: providers can create their own profiles](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-marketplace-providers-can-create-their-own-profiles)\n+ [Databricks Connect V2 is GA for Python](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-connect-v2-is-ga-for-python)\n+ [Databricks Terraform provider updated to version 1.18.0](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-terraform-provider-updated-to-version-1180)\n+ [New Databricks Marketplace providers](https://docs.databricks.com/release-notes/product/2023/june.html#new-databricks-marketplace-providers)\n+ [Databricks Runtime 13.1 is GA](https://docs.databricks.com/release-notes/product/2023/june.html#databricks-runtime-131-is-ga)\n+ [Partner Connect supports Hunters](https://docs.databricks.com/release-notes/product/2023/june.html#partner-connect-supports-hunters)\n+ [View data from notebooks, SQL editor, and Catalog Explorer with the unified schema browser (Public Preview)](https://docs.databricks.com/release-notes/product/2023/june.html#view-data-from-notebooks-sql-editor-and-catalog-explorer-with-the-unified-schema-browser-public-preview)\n* [May 2023](https://docs.databricks.com/release-notes/product/2023/may.html)\n+ [Create or modify table UI supports Avro, Parquet, and text file uploads](https://docs.databricks.com/release-notes/product/2023/may.html#create-or-modify-table-ui-supports-avro-parquet-and-text-file-uploads)\n+ [Use the add data UI to load data using Unity Catalog external locations (Public Preview)](https://docs.databricks.com/release-notes/product/2023/may.html#use-the-add-data-ui-to-load-data-using-unity-catalog-external-locations-public-preview)\n+ [Run Databricks notebooks on SQL warehouses (Public Preview)](https://docs.databricks.com/release-notes/product/2023/may.html#run-databricks-notebooks-on-sql-warehouses-public-preview)\n+ [Prevent Enter key from accepting autocomplete suggestion](https://docs.databricks.com/release-notes/product/2023/may.html#prevent-enter-key-from-accepting-autocomplete-suggestion)\n+ [Databricks Terraform provider updated to version 1.17.0](https://docs.databricks.com/release-notes/product/2023/may.html#databricks-terraform-provider-updated-to-version-1170)\n+ [Upload data UI supports new column data types](https://docs.databricks.com/release-notes/product/2023/may.html#upload-data-ui-supports-new-column-data-types)\n+ [Configure your workspace to use IMDS v2 (GA)](https://docs.databricks.com/release-notes/product/2023/may.html#configure-your-workspace-to-use-imds-v2-ga)\n+ [M7g and R7g Graviton3 instances are now supported on Databricks](https://docs.databricks.com/release-notes/product/2023/may.html#m7g-and-r7g-graviton3-instances-are-now-supported-on-databricks)\n+ [Bind Unity Catalog catalogs to specific workspaces](https://docs.databricks.com/release-notes/product/2023/may.html#bind-unity-catalog-catalogs-to-specific-workspaces)\n+ [All users can connect to Fivetran using Partner Connect (Public Preview)](https://docs.databricks.com/release-notes/product/2023/may.html#all-users-can-connect-to-fivetran-using-partner-connect-public-preview)\n+ [Authentication using OAuth tokens for service principals (Public Preview)](https://docs.databricks.com/release-notes/product/2023/may.html#authentication-using-oauth-tokens-for-service-principals-public-preview)\n+ [Schedule automatic cluster update (Public Preview)](https://docs.databricks.com/release-notes/product/2023/may.html#schedule-automatic-cluster-update-public-preview)\n+ [Databricks Terraform provider updated to version 1.16.1](https://docs.databricks.com/release-notes/product/2023/may.html#databricks-terraform-provider-updated-to-version-1161)\n+ [Databricks JDBC driver 2.6.33](https://docs.databricks.com/release-notes/product/2023/may.html#databricks-jdbc-driver-2633)\n+ [Partner Connect supports Alation](https://docs.databricks.com/release-notes/product/2023/may.html#partner-connect-supports-alation)\n+ [New default theme for editor](https://docs.databricks.com/release-notes/product/2023/may.html#new-default-theme-for-editor)\n+ [Databricks Terraform provider updated to version 1.16.0](https://docs.databricks.com/release-notes/product/2023/may.html#databricks-terraform-provider-updated-to-version-1160)\n+ [New region: Europe (Paris)](https://docs.databricks.com/release-notes/product/2023/may.html#new-region-europe-paris)\n+ [Compliance security profile now supports more EC2 instance types](https://docs.databricks.com/release-notes/product/2023/may.html#compliance-security-profile-now-supports-more-ec2-instance-types)\n+ [Databricks Runtime 13.1 (Beta)](https://docs.databricks.com/release-notes/product/2023/may.html#databricks-runtime-131-beta)\n+ [Run file-based SQL queries in a Databricks workflow](https://docs.databricks.com/release-notes/product/2023/may.html#run-file-based-sql-queries-in-a-databricks-workflow)\n+ [Databricks Terraform provider updated to version 1.15.0](https://docs.databricks.com/release-notes/product/2023/may.html#databricks-terraform-provider-updated-to-version-1150)\n+ [Account nicknames now available in the account console](https://docs.databricks.com/release-notes/product/2023/may.html#account-nicknames-now-available-in-the-account-console)\n+ [Share notebooks using Delta Sharing](https://docs.databricks.com/release-notes/product/2023/may.html#share-notebooks-using-delta-sharing)\n+ [Deprecation of cluster-scoped init scripts on DBFS](https://docs.databricks.com/release-notes/product/2023/may.html#deprecation-of-cluster-scoped-init-scripts-on-dbfs)\n+ [New region: South America (S\u00e3o Paulo)](https://docs.databricks.com/release-notes/product/2023/may.html#new-region-south-america-s\u00e3o-paulo)\n+ [Unified navigation (Public Preview)](https://docs.databricks.com/release-notes/product/2023/may.html#unified-navigation-public-preview)\n+ [AWS fleet instance types now available](https://docs.databricks.com/release-notes/product/2023/may.html#aws-fleet-instance-types-now-available)\n* [April 2023](https://docs.databricks.com/release-notes/product/2023/april.html)\n+ [Databricks Marketplace (Public Preview): an open marketplace for data, analytics, and AI](https://docs.databricks.com/release-notes/product/2023/april.html#databricks-marketplace-public-preview-an-open-marketplace-for-data-analytics-and-ai)\n+ [Configure the Python formatter](https://docs.databricks.com/release-notes/product/2023/april.html#configure-the-python-formatter)\n+ [Workspace files are GA](https://docs.databricks.com/release-notes/product/2023/april.html#workspace-files-are-ga)\n+ [Cluster-scoped init scripts can now be stored in workspace files](https://docs.databricks.com/release-notes/product/2023/april.html#cluster-scoped-init-scripts-can-now-be-stored-in-workspace-files)\n+ [New Delta Sharing privileges enable delegation of share, recipient, and provider management tasks](https://docs.databricks.com/release-notes/product/2023/april.html#new-delta-sharing-privileges-enable-delegation-of-share-recipient-and-provider-management-tasks)\n+ [Databricks Connect V2 (Public Preview)](https://docs.databricks.com/release-notes/product/2023/april.html#databricks-connect-v2-public-preview)\n+ [New cluster metrics UI](https://docs.databricks.com/release-notes/product/2023/april.html#new-cluster-metrics-ui)\n+ [Databricks Runtime 13.0 is GA](https://docs.databricks.com/release-notes/product/2023/april.html#databricks-runtime-130-is-ga)\n+ [Control access to the account console by IP address ranges (Public Preview)](https://docs.databricks.com/release-notes/product/2023/april.html#control-access-to-the-account-console-by-ip-address-ranges-public-preview)\n+ [Databricks Terraform provider updated to version 1.14.3](https://docs.databricks.com/release-notes/product/2023/april.html#databricks-terraform-provider-updated-to-version-1143)\n+ [Audit log entries for changed admin settings for workspaces and accounts](https://docs.databricks.com/release-notes/product/2023/april.html#audit-log-entries-for-changed-admin-settings-for-workspaces-and-accounts)\n+ [Workspaces with security profile or ESM include audit logs rows for system and monitor logs](https://docs.databricks.com/release-notes/product/2023/april.html#workspaces-with-security-profile-or-esm-include-audit-logs-rows-for-system-and-monitor-logs)\n+ [Databricks Terraform provider updated to versions 1.14.1 and 1.14.2](https://docs.databricks.com/release-notes/product/2023/april.html#databricks-terraform-provider-updated-to-versions-1141-and-1142)\n+ [Combined SQL user settings and general user settings](https://docs.databricks.com/release-notes/product/2023/april.html#combined-sql-user-settings-and-general-user-settings)\n+ [Legacy notebook visualizations deprecated](https://docs.databricks.com/release-notes/product/2023/april.html#legacy-notebook-visualizations-deprecated)\n+ [Create or modify table from file upload page supports JSON file uploads](https://docs.databricks.com/release-notes/product/2023/april.html#create-or-modify-table-from-file-upload-page-supports-json-file-uploads)\n* [March 2023](https://docs.databricks.com/release-notes/product/2023/march.html)\n+ [Databricks Terraform provider updated to version 1.14.0](https://docs.databricks.com/release-notes/product/2023/march.html#databricks-terraform-provider-updated-to-version-1140)\n+ [Databricks Runtime 7.3 LTS ML support ends](https://docs.databricks.com/release-notes/product/2023/march.html#databricks-runtime-73-lts-ml-support-ends)\n+ [C7g Graviton 3 instances are now supported on Databricks](https://docs.databricks.com/release-notes/product/2023/march.html#c7g-graviton-3-instances-are-now-supported-on-databricks)\n+ [Distributed training with TorchDistributor](https://docs.databricks.com/release-notes/product/2023/march.html#distributed-training-with-torchdistributor)\n+ [Databricks Runtime 13.0 (Beta)](https://docs.databricks.com/release-notes/product/2023/march.html#databricks-runtime-130-beta)\n+ [Improved file editor](https://docs.databricks.com/release-notes/product/2023/march.html#improved-file-editor)\n+ [Databricks no longer creates a serverless starter SQL warehouse](https://docs.databricks.com/release-notes/product/2023/march.html#databricks-no-longer-creates-a-serverless-starter-sql-warehouse)\n+ [In SQL Warehouses API, enabling serverless compute now must be explicit](https://docs.databricks.com/release-notes/product/2023/march.html#in-sql-warehouses-api-enabling-serverless-compute-now-must-be-explicit)\n+ [Changes for workspace settings for serverless SQL warehouses](https://docs.databricks.com/release-notes/product/2023/march.html#changes-for-workspace-settings-for-serverless-sql-warehouses)\n+ [Changes for serverless compute settings for accounts and workspaces](https://docs.databricks.com/release-notes/product/2023/march.html#changes-for-serverless-compute-settings-for-accounts-and-workspaces)\n+ [Databricks SQL Serverless is GA](https://docs.databricks.com/release-notes/product/2023/march.html#databricks-sql-serverless-is-ga)\n+ [.ipynb (Jupyter) notebook support in Repos (preview)](https://docs.databricks.com/release-notes/product/2023/march.html#ipynb-jupyter-notebook-support-in-repos-preview)\n+ [Support for reload4j](https://docs.databricks.com/release-notes/product/2023/march.html#support-for-reload4j)\n+ [Execute SQL cells in the notebook in parallel](https://docs.databricks.com/release-notes/product/2023/march.html#execute-sql-cells-in-the-notebook-in-parallel)\n+ [Create job tasks using Python code stored in a Git repo](https://docs.databricks.com/release-notes/product/2023/march.html#create-job-tasks-using-python-code-stored-in-a-git-repo)\n+ [Databricks Terraform provider updated to version 1.13.0](https://docs.databricks.com/release-notes/product/2023/march.html#databricks-terraform-provider-updated-to-version-1130)\n+ [Databricks Terraform provider updated to version 1.12.0](https://docs.databricks.com/release-notes/product/2023/march.html#databricks-terraform-provider-updated-to-version-1120)\n+ [SQL admin console and workspace admin console combined](https://docs.databricks.com/release-notes/product/2023/march.html#sql-admin-console-and-workspace-admin-console-combined)\n+ [Model Serving is GA](https://docs.databricks.com/release-notes/product/2023/march.html#model-serving-is-ga)\n+ [Automatic feature lookup is GA](https://docs.databricks.com/release-notes/product/2023/march.html#automatic-feature-lookup-is-ga)\n+ [New Catalog Explorer availability](https://docs.databricks.com/release-notes/product/2023/march.html#new-catalog-explorer-availability)\n+ [View frequent queries and users of a table using the Insights tab](https://docs.databricks.com/release-notes/product/2023/march.html#view-frequent-queries-and-users-of-a-table-using-the-insights-tab)\n+ [Exact match search is available in global search](https://docs.databricks.com/release-notes/product/2023/march.html#exact-match-search-is-available-in-global-search)\n+ [View lineage information for your Databricks jobs](https://docs.databricks.com/release-notes/product/2023/march.html#view-lineage-information-for-your-databricks-jobs)\n+ [Databricks Runtime 12.2 LTS and Databricks Runtime 12.2 LTS ML are GA](https://docs.databricks.com/release-notes/product/2023/march.html#databricks-runtime-122-lts-and-databricks-runtime-122-lts-ml-are-ga)\n+ [Workspace files are now in Public Preview](https://docs.databricks.com/release-notes/product/2023/march.html#workspace-files-are-now-in-public-preview)\n* [February 2023](https://docs.databricks.com/release-notes/product/2023/february.html)\n+ [SAML single sign-on (SSO) in the account console is generally available](https://docs.databricks.com/release-notes/product/2023/february.html#saml-single-sign-on-sso-in-the-account-console-is-generally-available)\n+ [Ray on Databricks (Public Preview)](https://docs.databricks.com/release-notes/product/2023/february.html#ray-on-databricks-public-preview)\n+ [Notebook cell output results limit increased](https://docs.databricks.com/release-notes/product/2023/february.html#notebook-cell-output-results-limit-increased)\n+ [Databricks Jobs now supports running continuous jobs](https://docs.databricks.com/release-notes/product/2023/february.html#databricks-jobs-now-supports-running-continuous-jobs)\n+ [Trigger your Databricks job when new files arrive](https://docs.databricks.com/release-notes/product/2023/february.html#trigger-your-databricks-job-when-new-files-arrive)\n+ [Databricks Terraform provider updated to version 1.10.0](https://docs.databricks.com/release-notes/product/2023/february.html#databricks-terraform-provider-updated-to-version-1100)\n+ [Legacy global init scripts and cluster-named init scripts disabled](https://docs.databricks.com/release-notes/product/2023/february.html#legacy-global-init-scripts-and-cluster-named-init-scripts-disabled)\n+ [Improvements to the MLflow experiment UI](https://docs.databricks.com/release-notes/product/2023/february.html#improvements-to-the-mlflow-experiment-ui)\n+ [Databricks Runtime 12.2 (Beta)](https://docs.databricks.com/release-notes/product/2023/february.html#databricks-runtime-122-beta)\n+ [Databricks extension for Visual Studio Code (Public Preview)](https://docs.databricks.com/release-notes/product/2023/february.html#databricks-extension-for-visual-studio-code-public-preview)\n+ [Serverless Real-Time Inference Public Preview now available to all customers](https://docs.databricks.com/release-notes/product/2023/february.html#serverless-real-time-inference-public-preview-now-available-to-all-customers)\n+ [Databricks Terraform provider updated to version 1.9.2](https://docs.databricks.com/release-notes/product/2023/february.html#databricks-terraform-provider-updated-to-version-192)\n+ [Variable explorer in Databricks notebooks](https://docs.databricks.com/release-notes/product/2023/february.html#variable-explorer-in-databricks-notebooks)\n* [January 2023](https://docs.databricks.com/release-notes/product/2023/january.html)\n+ [Authenticate to Power BI and Tableau using OAuth](https://docs.databricks.com/release-notes/product/2023/january.html#authenticate-to-power-bi-and-tableau-using-oauth)\n+ [Audit logs include entries for OAuth SSO authentication to the account console (Public Preview)](https://docs.databricks.com/release-notes/product/2023/january.html#audit-logs-include-entries-for-oauth-sso-authentication-to-the-account-console-public-preview)\n+ [Account SCIM is now GA](https://docs.databricks.com/release-notes/product/2023/january.html#account-scim-is-now-ga)\n+ [Easier creation and editing of Databricks jobs in the UI](https://docs.databricks.com/release-notes/product/2023/january.html#easier-creation-and-editing-of-databricks-jobs-in-the-ui)\n+ [Improvements to the Databricks Jobs UI when viewing job runs](https://docs.databricks.com/release-notes/product/2023/january.html#improvements-to-the-databricks-jobs-ui-when-viewing-job-runs)\n+ [REST API Reference is now available for browsing API documentation](https://docs.databricks.com/release-notes/product/2023/january.html#rest-api-reference-is-now-available-for-browsing-api-documentation)\n+ [New account console home screen provides better account management experience](https://docs.databricks.com/release-notes/product/2023/january.html#new-account-console-home-screen-provides-better-account-management-experience)\n+ [Databricks Terraform provider updated to version 1.9.1](https://docs.databricks.com/release-notes/product/2023/january.html#databricks-terraform-provider-updated-to-version-191)\n+ [Account users can update email preferences in the account console](https://docs.databricks.com/release-notes/product/2023/january.html#account-users-can-update-email-preferences-in-the-account-console)\n+ [Region support consolidated onto one page](https://docs.databricks.com/release-notes/product/2023/january.html#region-support-consolidated-onto-one-page)\n+ [Databricks Runtime 12.1 and Databricks Runtime 12.1 ML are GA](https://docs.databricks.com/release-notes/product/2023/january.html#databricks-runtime-121-and-databricks-runtime-121-ml-are-ga)\n+ [Cluster policies now support limiting the max number of clusters per user](https://docs.databricks.com/release-notes/product/2023/january.html#cluster-policies-now-support-limiting-the-max-number-of-clusters-per-user)\n+ [Databricks Terraform provider updated to version 1.9.0](https://docs.databricks.com/release-notes/product/2023/january.html#databricks-terraform-provider-updated-to-version-190)\n+ [Partner Connect supports connecting to Privacera](https://docs.databricks.com/release-notes/product/2023/january.html#partner-connect-supports-connecting-to-privacera)\n+ [Databricks Terraform provider updated to version 1.8.0](https://docs.databricks.com/release-notes/product/2023/january.html#databricks-terraform-provider-updated-to-version-180)\n+ [Databricks Runtime 12.1 (Beta)](https://docs.databricks.com/release-notes/product/2023/january.html#databricks-runtime-121-beta)\n+ [Partner Connect supports Sigma](https://docs.databricks.com/release-notes/product/2023/january.html#partner-connect-supports-sigma)\n+ [New left and right sidebars in Databricks notebooks](https://docs.databricks.com/release-notes/product/2023/january.html#new-left-and-right-sidebars-in-databricks-notebooks)\n* [December 2022](https://docs.databricks.com/release-notes/product/2022/december.html)\n+ [Databricks SQL Driver for Go is Generally Available](https://docs.databricks.com/release-notes/product/2022/december.html#databricks-sql-driver-for-go-is-generally-available)\n+ [Prevent concurrent workspace updates](https://docs.databricks.com/release-notes/product/2022/december.html#prevent-concurrent-workspace-updates)\n+ [Databricks Terraform provider updated to version 1.7.0](https://docs.databricks.com/release-notes/product/2022/december.html#databricks-terraform-provider-updated-to-version-170)\n+ [Databricks Runtime 12.0 and 12.0 ML are GA](https://docs.databricks.com/release-notes/product/2022/december.html#databricks-runtime-120-and-120-ml-are-ga)\n+ [Jobs are now available in global search](https://docs.databricks.com/release-notes/product/2022/december.html#jobs-are-now-available-in-global-search)\n+ [Billable usage graphs can now aggregate by individual tags](https://docs.databricks.com/release-notes/product/2022/december.html#billable-usage-graphs-can-now-aggregate-by-individual-tags)\n+ [Use SQL to specify schema- and catalog-level storage locations for Unity Catalog managed tables](https://docs.databricks.com/release-notes/product/2022/december.html#use-sql-to-specify-schema--and-catalog-level-storage-locations-for-unity-catalog-managed-tables)\n+ [Capturing lineage data with Unity Catalog is now generally available](https://docs.databricks.com/release-notes/product/2022/december.html#capturing-lineage-data-with-unity-catalog-is-now-generally-available)\n+ [Databricks ODBC driver 2.6.29](https://docs.databricks.com/release-notes/product/2022/december.html#databricks-odbc-driver-2629)\n+ [Databricks JDBC driver 2.6.32](https://docs.databricks.com/release-notes/product/2022/december.html#databricks-jdbc-driver-2632)\n+ [Partner Connect supports connecting to AtScale](https://docs.databricks.com/release-notes/product/2022/december.html#partner-connect-supports-connecting-to-atscale)\n+ [Improved serverless SQL warehouse support for customer-managed keys](https://docs.databricks.com/release-notes/product/2022/december.html#improved-serverless-sql-warehouse-support-for-customer-managed-keys)\n* [November 2022](https://docs.databricks.com/release-notes/product/2022/november.html)\n+ [Enhanced notifications for your Databricks jobs (Public Preview)](https://docs.databricks.com/release-notes/product/2022/november.html#enhanced-notifications-for-your-databricks-jobs-public-preview)\n+ [Databricks Runtime 12.0 (Beta)](https://docs.databricks.com/release-notes/product/2022/november.html#databricks-runtime-120-beta)\n+ [Upload data UI can now be disabled via admin settings](https://docs.databricks.com/release-notes/product/2022/november.html#upload-data-ui-can-now-be-disabled-via-admin-settings)\n+ [Partner Connect support for Unity Catalog is GA](https://docs.databricks.com/release-notes/product/2022/november.html#partner-connect-support-for-unity-catalog-is-ga)\n+ [Work with large repositories with Sparse Checkout](https://docs.databricks.com/release-notes/product/2022/november.html#work-with-large-repositories-with-sparse-checkout)\n+ [Databricks Terraform provider updated to version 1.6.5](https://docs.databricks.com/release-notes/product/2022/november.html#databricks-terraform-provider-updated-to-version-165)\n+ [Databricks Terraform provider updated to versions 1.6.3 and 1.6.4](https://docs.databricks.com/release-notes/product/2022/november.html#databricks-terraform-provider-updated-to-versions-163-and-164)\n+ [Specify a cloud storage location for Unity Catalog managed tables at the catalog and schema levels](https://docs.databricks.com/release-notes/product/2022/november.html#specify-a-cloud-storage-location-for-unity-catalog-managed-tables-at-the-catalog-and-schema-levels)\n+ [Access recent objects from the search field in the top bar of your workspace](https://docs.databricks.com/release-notes/product/2022/november.html#access-recent-objects-from-the-search-field-in-the-top-bar-of-your-workspace)\n+ [Create or modify table from file upload page now supports multiple files](https://docs.databricks.com/release-notes/product/2022/november.html#create-or-modify-table-from-file-upload-page-now-supports-multiple-files)\n+ [Create or modify table from file upload page now supports overwrite](https://docs.databricks.com/release-notes/product/2022/november.html#create-or-modify-table-from-file-upload-page-now-supports-overwrite)\n+ [Search for jobs by name with the Jobs API 2.1](https://docs.databricks.com/release-notes/product/2022/november.html#search-for-jobs-by-name-with-the-jobs-api-21)\n+ [Databricks Terraform provider updated to version 1.6.2](https://docs.databricks.com/release-notes/product/2022/november.html#databricks-terraform-provider-updated-to-version-162)\n+ [Search for tables in Unity Catalog is GA](https://docs.databricks.com/release-notes/product/2022/november.html#search-for-tables-in-unity-catalog-is-ga)\n* [October 2022](https://docs.databricks.com/release-notes/product/2022/october.html)\n+ [GA: Repos support for non-notebook files](https://docs.databricks.com/release-notes/product/2022/october.html#ga-repos-support-for-non-notebook-files)\n+ [Deploy models for streaming inference with Delta Live Tables notebooks](https://docs.databricks.com/release-notes/product/2022/october.html#deploy-models-for-streaming-inference-with-delta-live-tables-notebooks)\n+ [Connect to Fivetran from the add data UI](https://docs.databricks.com/release-notes/product/2022/october.html#connect-to-fivetran-from-the-add-data-ui)\n+ [Databricks SQL Driver for Node.js is Generally Available](https://docs.databricks.com/release-notes/product/2022/october.html#databricks-sql-driver-for-nodejs-is-generally-available)\n+ [Partner Connect supports connecting to erwin Data Modeler by Quest](https://docs.databricks.com/release-notes/product/2022/october.html#partner-connect-supports-connecting-to-erwin-data-modeler-by-quest)\n+ [Enforce user isolation cluster types on a workspace](https://docs.databricks.com/release-notes/product/2022/october.html#enforce-user-isolation-cluster-types-on-a-workspace)\n+ [Databricks Runtime 11.3 LTS and 11.3 LTS ML are GA](https://docs.databricks.com/release-notes/product/2022/october.html#databricks-runtime-113-lts-and-113-lts-ml-are-ga)\n+ [Format Python code in notebooks (Public Preview)](https://docs.databricks.com/release-notes/product/2022/october.html#format-python-code-in-notebooks-public-preview)\n+ [IP access lists no longer block PrivateLink traffic](https://docs.databricks.com/release-notes/product/2022/october.html#ip-access-lists-no-longer-block-privatelink-traffic)\n+ [AWS PrivateLink support is now generally available](https://docs.databricks.com/release-notes/product/2022/october.html#aws-privatelink-support-is-now-generally-available)\n+ [Improvements to AWS PrivateLink support for updating workspaces](https://docs.databricks.com/release-notes/product/2022/october.html#improvements-to-aws-privatelink-support-for-updating-workspaces)\n+ [Update a failed workspace with Databricks-managed VPC to use a customer-managed VPC](https://docs.databricks.com/release-notes/product/2022/october.html#update-a-failed-workspace-with-databricks-managed-vpc-to-use-a-customer-managed-vpc)\n+ [Personal Compute cluster policy is available by default to all users](https://docs.databricks.com/release-notes/product/2022/october.html#personal-compute-cluster-policy-is-available-by-default-to-all-users)\n+ [Add data UI provides a central UI for loading data to Databricks](https://docs.databricks.com/release-notes/product/2022/october.html#add-data-ui-provides-a-central-ui-for-loading-data-to-databricks)\n+ [Create or modify table from file upload page unifies experience for small file upload to Delta Lake](https://docs.databricks.com/release-notes/product/2022/october.html#create-or-modify-table-from-file-upload-page-unifies-experience-for-small-file-upload-to-delta-lake)\n+ [Partner Connect supports connecting to Hevo Data](https://docs.databricks.com/release-notes/product/2022/october.html#partner-connect-supports-connecting-to-hevo-data)\n+ [Enable admin protection for No Isolation Shared clusters](https://docs.databricks.com/release-notes/product/2022/october.html#enable-admin-protection-for-no-isolation-shared-clusters)\n+ [SQL persona integrated with new search experience](https://docs.databricks.com/release-notes/product/2022/october.html#sql-persona-integrated-with-new-search-experience)\n+ [Databricks is a FedRAMP\u00ae Authorized Cloud Service Offering (CSO) at the moderate impact Level](https://docs.databricks.com/release-notes/product/2022/october.html#databricks-is-a-fedramp\u00ae-authorized-cloud-service-offering-cso-at-the-moderate-impact-level)\n+ [Serverless SQL warehouses are available in regions `eu-central-1` and `us-east-2`](https://docs.databricks.com/release-notes/product/2022/october.html#serverless-sql-warehouses-are-available-in-regions-eu-central-1-and-us-east-2)\n+ [Privilege inheritance in now supported in Unity Catalog](https://docs.databricks.com/release-notes/product/2022/october.html#privilege-inheritance-in-now-supported-in-unity-catalog)\n+ [Top navigation bar in the UI](https://docs.databricks.com/release-notes/product/2022/october.html#top-navigation-bar-in-the-ui)\n+ [Databricks Runtime 11.3 (Beta)](https://docs.databricks.com/release-notes/product/2022/october.html#databricks-runtime-113-beta)\n+ [The account console is available in multiple languages](https://docs.databricks.com/release-notes/product/2022/october.html#the-account-console-is-available-in-multiple-languages)\n* [September 2022](https://docs.databricks.com/release-notes/product/2022/september.html)\n+ [New reference solution for natural language processing](https://docs.databricks.com/release-notes/product/2022/september.html#new-reference-solution-for-natural-language-processing)\n+ [More regions for Unity Catalog](https://docs.databricks.com/release-notes/product/2022/september.html#more-regions-for-unity-catalog)\n+ [Protect and control access to some types of encrypted data with customer-managed keys (GA)](https://docs.databricks.com/release-notes/product/2022/september.html#protect-and-control-access-to-some-types-of-encrypted-data-with-customer-managed-keys-ga)\n+ [Compliance security profile workspaces support `i4i` instance types](https://docs.databricks.com/release-notes/product/2022/september.html#compliance-security-profile-workspaces-support-i4i-instance-types)\n+ [Audit logs now include events for web terminal](https://docs.databricks.com/release-notes/product/2022/september.html#audit-logs-now-include-events-for-web-terminal)\n+ [Audit logs now include events for managing credentials for Git repos](https://docs.databricks.com/release-notes/product/2022/september.html#audit-logs-now-include-events-for-managing-credentials-for-git-repos)\n+ [Select cluster policies directly in the Delta Live Tables UI](https://docs.databricks.com/release-notes/product/2022/september.html#select-cluster-policies-directly-in-the-delta-live-tables-ui)\n+ [New data trasformation card on workspace landing pages](https://docs.databricks.com/release-notes/product/2022/september.html#new-data-trasformation-card-on-workspace-landing-pages)\n+ [Orchestrate Databricks SQL tasks in your Databricks workflows (Public Preview)](https://docs.databricks.com/release-notes/product/2022/september.html#orchestrate-databricks-sql-tasks-in-your-databricks-workflows-public-preview)\n+ [Delta cache is now disk cache](https://docs.databricks.com/release-notes/product/2022/september.html#delta-cache-is-now-disk-cache)\n+ [Capture and view lineage data with Unity Catalog (Public Preview)](https://docs.databricks.com/release-notes/product/2022/september.html#capture-and-view-lineage-data-with-unity-catalog-public-preview)\n+ [Search for tables using Catalog Explorer (Public Preview)](https://docs.databricks.com/release-notes/product/2022/september.html#search-for-tables-using-catalog-explorer-public-preview)\n+ [View and organize assets in the workspace browser across personas](https://docs.databricks.com/release-notes/product/2022/september.html#view-and-organize-assets-in-the-workspace-browser-across-personas)\n+ [Databricks Runtime 11.2 and 11.2 ML are GA](https://docs.databricks.com/release-notes/product/2022/september.html#databricks-runtime-112-and-112-ml-are-ga)\n+ [Support for AWS Graviton instances is GA](https://docs.databricks.com/release-notes/product/2022/september.html#support-for-aws-graviton-instances-is-ga)\n* [August 2022](https://docs.databricks.com/release-notes/product/2022/august.html)\n+ [Account users can access the account console](https://docs.databricks.com/release-notes/product/2022/august.html#account-users-can-access-the-account-console)\n+ [Databricks ODBC driver 2.6.26](https://docs.databricks.com/release-notes/product/2022/august.html#databricks-odbc-driver-2626)\n+ [Databricks JDBC driver 2.6.29](https://docs.databricks.com/release-notes/product/2022/august.html#databricks-jdbc-driver-2629)\n+ [Databricks Feature Store client now available on PyPI](https://docs.databricks.com/release-notes/product/2022/august.html#databricks-feature-store-client-now-available-on-pypi)\n+ [Unity Catalog is GA](https://docs.databricks.com/release-notes/product/2022/august.html#unity-catalog-is-ga)\n+ [Delta Sharing is GA](https://docs.databricks.com/release-notes/product/2022/august.html#delta-sharing-is-ga)\n+ [Databricks Runtime 11.2 (Beta)](https://docs.databricks.com/release-notes/product/2022/august.html#databricks-runtime-112-beta)\n+ [Reduced message volume in the Delta Live Tables UI for continuous pipelines](https://docs.databricks.com/release-notes/product/2022/august.html#reduced-message-volume-in-the-delta-live-tables-ui-for-continuous-pipelines)\n+ [Easier cluster configuration for your Delta Live Tables pipelines](https://docs.databricks.com/release-notes/product/2022/august.html#easier-cluster-configuration-for-your-delta-live-tables-pipelines)\n+ [Orchestrate dbt tasks in your Databricks workflows (Public Preview)](https://docs.databricks.com/release-notes/product/2022/august.html#orchestrate-dbt-tasks-in-your-databricks-workflows-public-preview)\n+ [Users can be members of multiple Databricks accounts](https://docs.databricks.com/release-notes/product/2022/august.html#users-can-be-members-of-multiple-databricks-accounts)\n+ [Identity federation is GA](https://docs.databricks.com/release-notes/product/2022/august.html#identity-federation-is-ga)\n+ [Partner Connect supports connecting to Stardog](https://docs.databricks.com/release-notes/product/2022/august.html#partner-connect-supports-connecting-to-stardog)\n+ [Databricks Feature Store integration with Serverless Real-Time Inference](https://docs.databricks.com/release-notes/product/2022/august.html#databricks-feature-store-integration-with-serverless-real-time-inference)\n+ [Additional data type support for Databricks Feature Store automatic feature lookup](https://docs.databricks.com/release-notes/product/2022/august.html#additional-data-type-support-for-databricks-feature-store-automatic-feature-lookup)\n+ [Bring your own key: Encrypt Git credentials](https://docs.databricks.com/release-notes/product/2022/august.html#bring-your-own-key-encrypt-git-credentials)\n+ [Cluster UI preview and access mode replaces security mode](https://docs.databricks.com/release-notes/product/2022/august.html#cluster-ui-preview-and-access-mode-replaces-security-mode)\n+ [Unity Catalog limitations (Public Preview)](https://docs.databricks.com/release-notes/product/2022/august.html#unity-catalog-limitations-public-preview)\n+ [Serverless Real-Time Inference in Public Preview](https://docs.databricks.com/release-notes/product/2022/august.html#serverless-real-time-inference-in-public-preview)\n+ [Serverless SQL warehouses improvements](https://docs.databricks.com/release-notes/product/2022/august.html#serverless-sql-warehouses-improvements)\n+ [Share VPC endpoints among Databricks accounts](https://docs.databricks.com/release-notes/product/2022/august.html#share-vpc-endpoints-among-databricks-accounts)\n+ [AWS PrivateLink private access level `ANY` is deprecated](https://docs.databricks.com/release-notes/product/2022/august.html#aws-privatelink-private-access-level-any-is-deprecated)\n+ [Improvements to AWS PrivateLink connectivity](https://docs.databricks.com/release-notes/product/2022/august.html#improvements-to-aws-privatelink-connectivity)\n+ [Improved workspace search is now GA](https://docs.databricks.com/release-notes/product/2022/august.html#improved-workspace-search-is-now-ga)\n+ [Use generated columns when you create Delta Live Tables datasets](https://docs.databricks.com/release-notes/product/2022/august.html#use-generated-columns-when-you-create-delta-live-tables-datasets)\n+ [Improved editing for notebooks with Monaco-based editor (Experimental)](https://docs.databricks.com/release-notes/product/2022/august.html#improved-editing-for-notebooks-with-monaco-based-editor-experimental)\n+ [Compliance controls FedRAMP Moderate, PCI-DSS, and HIPAA (GA)](https://docs.databricks.com/release-notes/product/2022/august.html#compliance-controls-fedramp-moderate-pci-dss-and-hipaa-ga)\n+ [Add security controls with the compliance security profile (GA)](https://docs.databricks.com/release-notes/product/2022/august.html#add-security-controls-with-the-compliance-security-profile-ga)\n+ [Add image hardening and monitoring agents with enhanced security monitoring (GA)](https://docs.databricks.com/release-notes/product/2022/august.html#add-image-hardening-and-monitoring-agents-with-enhanced-security-monitoring-ga)\n+ [Databricks Runtime 10.3 series support ends](https://docs.databricks.com/release-notes/product/2022/august.html#databricks-runtime-103-series-support-ends)\n+ [Delta Live Tables now supports refreshing only selected tables in pipeline updates](https://docs.databricks.com/release-notes/product/2022/august.html#delta-live-tables-now-supports-refreshing-only-selected-tables-in-pipeline-updates)\n+ [Job execution now waits for cluster libraries to finish installing](https://docs.databricks.com/release-notes/product/2022/august.html#job-execution-now-waits-for-cluster-libraries-to-finish-installing)\n* [July 2022](https://docs.databricks.com/release-notes/product/2022/july.html)\n+ [Databricks Runtime 11.1 and 11.1 ML are GA](https://docs.databricks.com/release-notes/product/2022/july.html#databricks-runtime-111-and-111-ml-are-ga)\n+ [Photon is GA](https://docs.databricks.com/release-notes/product/2022/july.html#photon-is-ga)\n+ [Notification upon notebook completion](https://docs.databricks.com/release-notes/product/2022/july.html#notification-upon-notebook-completion)\n+ [Increased limit for the number of jobs in your Databricks workspaces](https://docs.databricks.com/release-notes/product/2022/july.html#increased-limit-for-the-number-of-jobs-in-your-databricks-workspaces)\n+ [Verbose audit logs now record when Databricks SQL queries are run](https://docs.databricks.com/release-notes/product/2022/july.html#verbose-audit-logs-now-record-when-databricks-sql-queries-are-run)\n+ [Databricks SQL Serverless supports instance profiles whose name does not match its associated role](https://docs.databricks.com/release-notes/product/2022/july.html#databricks-sql-serverless-supports-instance-profiles-whose-name-does-not-match-its-associated-role)\n+ [Configure your workspace to use IMDS v2 (Public Preview)](https://docs.databricks.com/release-notes/product/2022/july.html#configure-your-workspace-to-use-imds-v2-public-preview)\n+ [Databricks JDBC driver 2.6.27](https://docs.databricks.com/release-notes/product/2022/july.html#databricks-jdbc-driver-2627)\n+ [Databricks ODBC driver 2.6.25](https://docs.databricks.com/release-notes/product/2022/july.html#databricks-odbc-driver-2625)\n+ [Databricks Runtime 11.1 (Beta)](https://docs.databricks.com/release-notes/product/2022/july.html#databricks-runtime-111-beta)\n+ [Improved notebook visualizations](https://docs.databricks.com/release-notes/product/2022/july.html#improved-notebook-visualizations)\n* [June 2022](https://docs.databricks.com/release-notes/product/2022/june.html)\n+ [`ALTER TABLE` permission changes for Unity Catalog](https://docs.databricks.com/release-notes/product/2022/june.html#alter-table-permission-changes-for-unity-catalog)\n+ [Databricks Runtime 6.4 Extended Support reaches end of support](https://docs.databricks.com/release-notes/product/2022/june.html#databricks-runtime-64-extended-support-reaches-end-of-support)\n+ [Databricks Runtime 10.2 series support ends](https://docs.databricks.com/release-notes/product/2022/june.html#databricks-runtime-102-series-support-ends)\n+ [Databricks ODBC driver 2.6.24](https://docs.databricks.com/release-notes/product/2022/june.html#databricks-odbc-driver-2624)\n+ [Databricks Terraform provider is now GA](https://docs.databricks.com/release-notes/product/2022/june.html#databricks-terraform-provider-is-now-ga)\n+ [Serverless SQL warehouses available for E2 workspaces (Public Preview)](https://docs.databricks.com/release-notes/product/2022/june.html#serverless-sql-warehouses-available-for-e2-workspaces-public-preview)\n+ [Enable enhanced security controls with a security profile (Public Preview)](https://docs.databricks.com/release-notes/product/2022/june.html#enable-enhanced-security-controls-with-a-security-profile-public-preview)\n+ [PCI-DSS compliance controls (Public Preview)](https://docs.databricks.com/release-notes/product/2022/june.html#pci-dss-compliance-controls-public-preview)\n+ [HIPAA compliance controls for E2 (Public Preview)](https://docs.databricks.com/release-notes/product/2022/june.html#hipaa-compliance-controls-for-e2-public-preview)\n+ [Enhanced security monitoring (Public Preview)](https://docs.databricks.com/release-notes/product/2022/june.html#enhanced-security-monitoring-public-preview)\n+ [Databricks Runtime 11.0 and 11.0 ML are GA; 11.0 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2022/june.html#databricks-runtime-110-and-110-ml-are-ga-110-photon-is-public-preview)\n+ [Change to Repos default working directory in Databricks Runtime 11.0](https://docs.databricks.com/release-notes/product/2022/june.html#change-to-repos-default-working-directory-in-databricks-runtime-110)\n+ [Databricks Runtime 10.1 series support ends](https://docs.databricks.com/release-notes/product/2022/june.html#databricks-runtime-101-series-support-ends)\n+ [Audit logs can now record when a notebook command is run](https://docs.databricks.com/release-notes/product/2022/june.html#audit-logs-can-now-record-when-a-notebook-command-is-run)\n+ [Delta Live Tables now supports SCD type 2](https://docs.databricks.com/release-notes/product/2022/june.html#delta-live-tables-now-supports-scd-type-2)\n+ [Create Delta Live Tables pipelines directly in the Databricks UI](https://docs.databricks.com/release-notes/product/2022/june.html#create-delta-live-tables-pipelines-directly-in-the-databricks-ui)\n+ [Select the Delta Live Tables channel when you create or edit a pipeline](https://docs.databricks.com/release-notes/product/2022/june.html#select-the-delta-live-tables-channel-when-you-create-or-edit-a-pipeline)\n+ [Communicate between tasks in your Databricks jobs with task values](https://docs.databricks.com/release-notes/product/2022/june.html#communicate-between-tasks-in-your-databricks-jobs-with-task-values)\n+ [Enable account switching in the Databricks UI](https://docs.databricks.com/release-notes/product/2022/june.html#enable-account-switching-in-the-databricks-ui)\n+ [Updating the AWS Region for a failed workspace is no longer supported](https://docs.databricks.com/release-notes/product/2022/june.html#updating-the-aws-region-for-a-failed-workspace-is-no-longer-supported)\n* [May 2022](https://docs.databricks.com/release-notes/product/2022/may.html)\n+ [Copy and paste notebook cells between tabs and windows](https://docs.databricks.com/release-notes/product/2022/may.html#copy-and-paste-notebook-cells-between-tabs-and-windows)\n+ [Additional data type support for Databricks Feature Store automatic feature lookup](https://docs.databricks.com/release-notes/product/2022/may.html#additional-data-type-support-for-databricks-feature-store-automatic-feature-lookup)\n+ [Databricks Runtime 11.0 (Beta)](https://docs.databricks.com/release-notes/product/2022/may.html#databricks-runtime-110-beta)\n+ [Improved workspace search (Public Preview)](https://docs.databricks.com/release-notes/product/2022/may.html#improved-workspace-search-public-preview)\n+ [Explore SQL cell results in Python notebooks natively using Python](https://docs.databricks.com/release-notes/product/2022/may.html#explore-sql-cell-results-in-python-notebooks-natively-using-python)\n+ [Databricks Repos: Support for more files in a repo](https://docs.databricks.com/release-notes/product/2022/may.html#databricks-repos-support-for-more-files-in-a-repo)\n+ [Databricks Repos: Fix to issue with MLflow experiment data loss](https://docs.databricks.com/release-notes/product/2022/may.html#databricks-repos-fix-to-issue-with-mlflow-experiment-data-loss)\n+ [Upgrade wizard makes it easier to copy databases and multiple tables to Unity Catalog (Public Preview)](https://docs.databricks.com/release-notes/product/2022/may.html#upgrade-wizard-makes-it-easier-to-copy-databases-and-multiple-tables-to-unity-catalog-public-preview)\n+ [Power BI Desktop system-wide HTTP proxy support](https://docs.databricks.com/release-notes/product/2022/may.html#power-bi-desktop-system-wide-http-proxy-support)\n+ [Streamline billing and account management by signing up for Databricks using AWS Marketplace](https://docs.databricks.com/release-notes/product/2022/may.html#streamline-billing-and-account-management-by-signing-up-for-databricks-using-aws-marketplace)\n+ [Databricks Runtime 10.5 and 10.5 ML are GA; 10.5 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2022/may.html#databricks-runtime-105-and-105-ml-are-ga-105-photon-is-public-preview)\n+ [Authenticate to the account console using SAML 2.0 (Public Preview)](https://docs.databricks.com/release-notes/product/2022/may.html#authenticate-to-the-account-console-using-saml-20-public-preview)\n+ [Databricks JDBC driver 2.6.25](https://docs.databricks.com/release-notes/product/2022/may.html#databricks-jdbc-driver-2625)\n+ [See the user a pipeline runs as in the Delta Live Tables UI](https://docs.databricks.com/release-notes/product/2022/may.html#see-the-user-a-pipeline-runs-as-in-the-delta-live-tables-ui)\n* [April 2022](https://docs.databricks.com/release-notes/product/2022/april.html)\n+ [Use tags to better manage your Databricks jobs](https://docs.databricks.com/release-notes/product/2022/april.html#use-tags-to-better-manage-your-databricks-jobs)\n+ [Databricks Runtime 10.0 series support ends](https://docs.databricks.com/release-notes/product/2022/april.html#databricks-runtime-100-series-support-ends)\n+ [Get a visual overview of your job runs with the new jobs matrix view](https://docs.databricks.com/release-notes/product/2022/april.html#get-a-visual-overview-of-your-job-runs-with-the-new-jobs-matrix-view)\n+ [Save time and resources when your Databricks job runs are unsuccessful](https://docs.databricks.com/release-notes/product/2022/april.html#save-time-and-resources-when-your-databricks-job-runs-are-unsuccessful)\n+ [View the run history for job tasks](https://docs.databricks.com/release-notes/product/2022/april.html#view-the-run-history-for-job-tasks)\n+ [Assign a new cluster in the jobs UI when the Single User access no longer exists](https://docs.databricks.com/release-notes/product/2022/april.html#assign-a-new-cluster-in-the-jobs-ui-when-the-single-user-access-no-longer-exists)\n+ [Databricks Runtime 10.5 (Beta)](https://docs.databricks.com/release-notes/product/2022/april.html#databricks-runtime-105-beta)\n+ [Feature Store now supports publishing features to AWS DynamoDB](https://docs.databricks.com/release-notes/product/2022/april.html#feature-store-now-supports-publishing-features-to-aws-dynamodb)\n+ [The Delta Live Tables UI is enhanced to disable unauthorized actions](https://docs.databricks.com/release-notes/product/2022/april.html#the-delta-live-tables-ui-is-enhanced-to-disable-unauthorized-actions)\n+ [Databricks AutoML is generally available](https://docs.databricks.com/release-notes/product/2022/april.html#databricks-automl-is-generally-available)\n+ [Use datasets from Unity Catalog with AutoML](https://docs.databricks.com/release-notes/product/2022/april.html#use-datasets-from-unity-catalog-with-automl)\n+ [Delta Live Tables is GA on AWS and Azure, and in Public Preview on GCP](https://docs.databricks.com/release-notes/product/2022/april.html#delta-live-tables-is-ga-on-aws-and-azure-and-in-public-preview-on-gcp)\n+ [Delta Live Tables SQL interface: non-breaking change to table names](https://docs.databricks.com/release-notes/product/2022/april.html#delta-live-tables-sql-interface-non-breaking-change-to-table-names)\n* [March 2022](https://docs.databricks.com/release-notes/product/2022/march.html)\n+ [Better performance and cost for your Delta Live Tables pipelines with Databricks Enhanced Autoscaling](https://docs.databricks.com/release-notes/product/2022/march.html#better-performance-and-cost-for-your-delta-live-tables-pipelines-with-databricks-enhanced-autoscaling)\n+ [Files in Repos enabled by default in new workspaces](https://docs.databricks.com/release-notes/product/2022/march.html#files-in-repos-enabled-by-default-in-new-workspaces)\n+ [Databricks Feature Store is generally available](https://docs.databricks.com/release-notes/product/2022/march.html#databricks-feature-store-is-generally-available)\n+ [Share an experiment from the experiment page](https://docs.databricks.com/release-notes/product/2022/march.html#share-an-experiment-from-the-experiment-page)\n+ [RStudio Workbench bug fix](https://docs.databricks.com/release-notes/product/2022/march.html#rstudio-workbench-bug-fix)\n+ [New workspace language options](https://docs.databricks.com/release-notes/product/2022/march.html#new-workspace-language-options)\n+ [Databricks Runtime 10.4 LTS and 10.4 LTS ML are GA; 10.4 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2022/march.html#databricks-runtime-104-lts-and-104-lts-ml-are-ga-104-photon-is-public-preview)\n+ [Unity Catalog is available in Public Preview](https://docs.databricks.com/release-notes/product/2022/march.html#unity-catalog-is-available-in-public-preview)\n+ [Delta Sharing is available in Public Preview](https://docs.databricks.com/release-notes/product/2022/march.html#delta-sharing-is-available-in-public-preview)\n+ [Enhanced access control for Delta Live Tables pipelines](https://docs.databricks.com/release-notes/product/2022/march.html#enhanced-access-control-for-delta-live-tables-pipelines)\n+ [Test Delta Live Tables preview functionality with the new `channel` setting (Public Preview)](https://docs.databricks.com/release-notes/product/2022/march.html#test-delta-live-tables-preview-functionality-with-the-new-channel-setting-public-preview)\n+ [Improved error handling for Delta Live Tables Python functions (Public Preview)](https://docs.databricks.com/release-notes/product/2022/march.html#improved-error-handling-for-delta-live-tables-python-functions-public-preview)\n+ [Improvements to Databricks Repos](https://docs.databricks.com/release-notes/product/2022/march.html#improvements-to-databricks-repos)\n+ [Audit logging for cluster policy changes](https://docs.databricks.com/release-notes/product/2022/march.html#audit-logging-for-cluster-policy-changes)\n+ [Databricks Runtime 10.4 (Beta)](https://docs.databricks.com/release-notes/product/2022/march.html#databricks-runtime-104-beta)\n* [February 2022](https://docs.databricks.com/release-notes/product/2022/february.html)\n+ [Easier scheduling for your Delta Live Tables pipelines (Public Preview)](https://docs.databricks.com/release-notes/product/2022/february.html#easier-scheduling-for-your-delta-live-tables-pipelines-public-preview)\n+ [Easily browse the history of your Delta Live Tables pipeline updates (Public Preview)](https://docs.databricks.com/release-notes/product/2022/february.html#easily-browse-the-history-of-your-delta-live-tables-pipeline-updates-public-preview)\n+ [Ensure job idempotency for the Jobs API Run now request](https://docs.databricks.com/release-notes/product/2022/february.html#ensure-job-idempotency-for-the-jobs-api-run-now-request)\n+ [Jobs service stability and scalability improvements](https://docs.databricks.com/release-notes/product/2022/february.html#jobs-service-stability-and-scalability-improvements)\n+ [Compare MLflow runs from different experiments](https://docs.databricks.com/release-notes/product/2022/february.html#compare-mlflow-runs-from-different-experiments)\n+ [Improvements to MLflow compare runs display](https://docs.databricks.com/release-notes/product/2022/february.html#improvements-to-mlflow-compare-runs-display)\n+ [Improved visibility into job run owners in the clusters UI](https://docs.databricks.com/release-notes/product/2022/february.html#improved-visibility-into-job-run-owners-in-the-clusters-ui)\n+ [Drop dataset columns in AutoML](https://docs.databricks.com/release-notes/product/2022/february.html#drop-dataset-columns-in-automl)\n+ [Experiments page is GA](https://docs.databricks.com/release-notes/product/2022/february.html#experiments-page-is-ga)\n+ [Support for temporary tables in the Delta Live Tables Python interface](https://docs.databricks.com/release-notes/product/2022/february.html#support-for-temporary-tables-in-the-delta-live-tables-python-interface)\n+ [User interface improvements for Delta Live Tables (Public Preview)](https://docs.databricks.com/release-notes/product/2022/february.html#user-interface-improvements-for-delta-live-tables-public-preview)\n+ [Databricks Runtime 9.0 series support ends](https://docs.databricks.com/release-notes/product/2022/february.html#databricks-runtime-90-series-support-ends)\n+ [Data Science & Engineering landing page updates](https://docs.databricks.com/release-notes/product/2022/february.html#data-science--engineering-landing-page-updates)\n+ [Databricks Repos now supports AWS CodeCommit for Git integration](https://docs.databricks.com/release-notes/product/2022/february.html#databricks-repos-now-supports-aws-codecommit-for-git-integration)\n+ [Improved visualization for your Delta Live Tables pipelines (Public Preview)](https://docs.databricks.com/release-notes/product/2022/february.html#improved-visualization-for-your-delta-live-tables-pipelines-public-preview)\n+ [Updated Markdown parser](https://docs.databricks.com/release-notes/product/2022/february.html#updated-markdown-parser)\n+ [Delta Live Tables now supports change data capture processing (Public Preview)](https://docs.databricks.com/release-notes/product/2022/february.html#delta-live-tables-now-supports-change-data-capture-processing-public-preview)\n+ [Select algorithm frameworks to use with AutoML](https://docs.databricks.com/release-notes/product/2022/february.html#select-algorithm-frameworks-to-use-with-automl)\n+ [Customer-managed VPCs are now available in ap-northeast-2](https://docs.databricks.com/release-notes/product/2022/february.html#customer-managed-vpcs-are-now-available-in-ap-northeast-2)\n+ [Databricks hosted MLflow models can now look up features from online stores](https://docs.databricks.com/release-notes/product/2022/february.html#databricks-hosted-mlflow-models-can-now-look-up-features-from-online-stores)\n+ [Databricks Runtime 10.3 and 10.3 ML are GA; 10.3 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2022/february.html#databricks-runtime-103-and-103-ml-are-ga-103-photon-is-public-preview)\n* [January 2022](https://docs.databricks.com/release-notes/product/2022/january.html)\n+ [MLflow Model Registry Webhooks on Databricks (Public Preview)](https://docs.databricks.com/release-notes/product/2022/january.html#mlflow-model-registry-webhooks-on-databricks-public-preview)\n+ [Breaking change: cluster idempotency token cleared on cluster termination](https://docs.databricks.com/release-notes/product/2022/january.html#breaking-change-cluster-idempotency-token-cleared-on-cluster-termination)\n+ [Databricks Runtime 10.3 (Beta)](https://docs.databricks.com/release-notes/product/2022/january.html#databricks-runtime-103-beta)\n+ [View information on recent job runs](https://docs.databricks.com/release-notes/product/2022/january.html#view-information-on-recent-job-runs)\n+ [Use Markdown in Databricks Repos file editor](https://docs.databricks.com/release-notes/product/2022/january.html#use-markdown-in-databricks-repos-file-editor)\n+ [Improved cluster management for jobs that orchestrate multiple tasks](https://docs.databricks.com/release-notes/product/2022/january.html#improved-cluster-management-for-jobs-that-orchestrate-multiple-tasks)\n+ [Add or rotate the customer-managed key for managed services on a running workspace](https://docs.databricks.com/release-notes/product/2022/january.html#add-or-rotate-the-customer-managed-key-for-managed-services-on-a-running-workspace)\n+ [Delta Sharing Private Preview adds functionality and new terms](https://docs.databricks.com/release-notes/product/2022/january.html#delta-sharing-private-preview-adds-functionality-and-new-terms)\n+ [Address AWS GuardDuty alerts related to Databricks access to your S3 bucket](https://docs.databricks.com/release-notes/product/2022/january.html#address-aws-guardduty-alerts-related-to-databricks-access-to-your-s3-bucket)\n+ [Databricks Runtime 8.3 and Databricks Runtime 8.4 series support ends](https://docs.databricks.com/release-notes/product/2022/january.html#databricks-runtime-83-and-databricks-runtime-84-series-support-ends)\n+ [Databricks JDBC driver 2.6.22](https://docs.databricks.com/release-notes/product/2022/january.html#databricks-jdbc-driver-2622)\n+ [Support for G5 family of GPU-accelerated EC2 instances (Public Preview)](https://docs.databricks.com/release-notes/product/2022/january.html#support-for-g5-family-of-gpu-accelerated-ec2-instances-public-preview)\n+ [New Share button replaces Permissions icon in notebooks](https://docs.databricks.com/release-notes/product/2022/january.html#new-share-button-replaces-permissions-icon-in-notebooks)\n+ [New workspace language options](https://docs.databricks.com/release-notes/product/2022/january.html#new-workspace-language-options)\n* [December 2021](https://docs.databricks.com/release-notes/product/2021/december.html)\n+ [Databricks Runtime 6.4 Extended Support series end-of-support date extended](https://docs.databricks.com/release-notes/product/2021/december.html#databricks-runtime-64-extended-support-series-end-of-support-date-extended)\n+ [Databricks Runtime 5.5 Extended Support series reaches end of support](https://docs.databricks.com/release-notes/product/2021/december.html#databricks-runtime-55-extended-support-series-reaches-end-of-support)\n+ [Databricks JDBC driver 2.6.21](https://docs.databricks.com/release-notes/product/2021/december.html#databricks-jdbc-driver-2621)\n+ [Databricks Connector for Tableau 2021.4](https://docs.databricks.com/release-notes/product/2021/december.html#databricks-connector-for-tableau-20214)\n+ [Databricks Runtime 10.2 and 10.2 ML are GA; 10.2 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2021/december.html#databricks-runtime-102-and-102-ml-are-ga-102-photon-is-public-preview)\n+ [Workspaces in the `ap-southeast-1` region now support AWS PrivateLink](https://docs.databricks.com/release-notes/product/2021/december.html#workspaces-in-the-ap-southeast-1-region-now-support-aws-privatelink)\n+ [Updated Markdown parser](https://docs.databricks.com/release-notes/product/2021/december.html#updated-markdown-parser)\n+ [User interface improvements for Delta Live Tables](https://docs.databricks.com/release-notes/product/2021/december.html#user-interface-improvements-for-delta-live-tables)\n+ [Databricks Runtime 8.3 series support extended](https://docs.databricks.com/release-notes/product/2021/december.html#databricks-runtime-83-series-support-extended)\n+ [Databricks Runtime 10.2 (Beta)](https://docs.databricks.com/release-notes/product/2021/december.html#databricks-runtime-102-beta)\n+ [Revert of recent breaking change that removed escaping and quotes from $ in environment variable values for cluster creation](https://docs.databricks.com/release-notes/product/2021/december.html#revert-of-recent-breaking-change-that-removed-escaping-and-quotes-from--in-environment-variable-values-for-cluster-creation)\n+ [serverless SQL warehouses are available in region `eu-west-1`](https://docs.databricks.com/release-notes/product/2021/december.html#serverless-sql-warehouses-are-available-in-region-eu-west-1)\n* [November 2021](https://docs.databricks.com/release-notes/product/2021/november.html)\n+ [Create tags for feature tables (Public Preview)](https://docs.databricks.com/release-notes/product/2021/november.html#create-tags-for-feature-tables-public-preview)\n+ [Syntax highlighting and autocomplete for SQL commands in Python cells](https://docs.databricks.com/release-notes/product/2021/november.html#syntax-highlighting-and-autocomplete-for-sql-commands-in-python-cells)\n+ [Rename, delete, and change permissions for MLflow experiments from experiment page (Public Preview)](https://docs.databricks.com/release-notes/product/2021/november.html#rename-delete-and-change-permissions-for-mlflow-experiments-from-experiment-page-public-preview)\n+ [New data profiles in notebooks: tabular and graphic summaries of your data (Public Preview)](https://docs.databricks.com/release-notes/product/2021/november.html#new-data-profiles-in-notebooks-tabular-and-graphic-summaries-of-your-data-public-preview)\n+ [Improved logging when schemas evolve while running a Delta Live Tables pipeline](https://docs.databricks.com/release-notes/product/2021/november.html#improved-logging-when-schemas-evolve-while-running-a-delta-live-tables-pipeline)\n+ [Databricks Partner Connect GA](https://docs.databricks.com/release-notes/product/2021/november.html#databricks-partner-connect-ga)\n+ [Breaking change: remove escaping and quotes from $ in environment variable values for cluster creation](https://docs.databricks.com/release-notes/product/2021/november.html#breaking-change-remove-escaping-and-quotes-from--in-environment-variable-values-for-cluster-creation)\n+ [Ease of use improvements for Files in Repos](https://docs.databricks.com/release-notes/product/2021/november.html#ease-of-use-improvements-for-files-in-repos)\n+ [Support for legacy SQL widgets ends on January 15, 2022](https://docs.databricks.com/release-notes/product/2021/november.html#support-for-legacy-sql-widgets-ends-on-january-15-2022)\n+ [User interface improvements for Databricks jobs](https://docs.databricks.com/release-notes/product/2021/november.html#user-interface-improvements-for-databricks-jobs)\n+ [Delta Sharing Connector for Power BI](https://docs.databricks.com/release-notes/product/2021/november.html#delta-sharing-connector-for-power-bi)\n+ [Databricks ODBC driver 2.6.19](https://docs.databricks.com/release-notes/product/2021/november.html#databricks-odbc-driver-2619)\n+ [Databricks Runtime 10.1 and 10.1 ML are GA; 10.1 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2021/november.html#databricks-runtime-101-and-101-ml-are-ga-101-photon-is-public-preview)\n+ [Databricks Runtime 10.1 (Beta)](https://docs.databricks.com/release-notes/product/2021/november.html#databricks-runtime-101-beta)\n+ [Rename and delete MLflow experiments (Public Preview)](https://docs.databricks.com/release-notes/product/2021/november.html#rename-and-delete-mlflow-experiments-public-preview)\n+ [Photon support for additional cluster instance families](https://docs.databricks.com/release-notes/product/2021/november.html#photon-support-for-additional-cluster-instance-families)\n+ [You can now create a cluster policy by cloning an existing policy](https://docs.databricks.com/release-notes/product/2021/november.html#you-can-now-create-a-cluster-policy-by-cloning-an-existing-policy)\n+ [Single sign-on (SSO) in the account console is Generally Available](https://docs.databricks.com/release-notes/product/2021/november.html#single-sign-on-sso-in-the-account-console-is-generally-available)\n+ [Change the default language of notebooks and notebook cells more easily](https://docs.databricks.com/release-notes/product/2021/november.html#change-the-default-language-of-notebooks-and-notebook-cells-more-easily)\n+ [Use Files in Repos from the web terminal](https://docs.databricks.com/release-notes/product/2021/november.html#use-files-in-repos-from-the-web-terminal)\n* [October 2021](https://docs.databricks.com/release-notes/product/2021/october.html)\n+ [Databricks Runtime 8.2 series support ends](https://docs.databricks.com/release-notes/product/2021/october.html#databricks-runtime-82-series-support-ends)\n+ [Databricks Runtime 10.0 and 10.0 ML are GA; 10.0 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2021/october.html#databricks-runtime-100-and-100-ml-are-ga-100-photon-is-public-preview)\n+ [Limit the set of VPC endpoints your workspace can use for AWS PrivateLink connections (Public Preview)](https://docs.databricks.com/release-notes/product/2021/october.html#limit-the-set-of-vpc-endpoints-your-workspace-can-use-for-aws-privatelink-connections-public-preview)\n+ [User interface improvements for Delta Live Tables (Public Preview)](https://docs.databricks.com/release-notes/product/2021/october.html#user-interface-improvements-for-delta-live-tables-public-preview)\n+ [Specify a fixed-size cluster when you create a new pipeline in Delta Live Tables (Public Preview)](https://docs.databricks.com/release-notes/product/2021/october.html#specify-a-fixed-size-cluster-when-you-create-a-new-pipeline-in-delta-live-tables-public-preview)\n+ [View data quality metrics for tables in Delta Live Tables triggered pipelines (Public Preview)](https://docs.databricks.com/release-notes/product/2021/october.html#view-data-quality-metrics-for-tables-in-delta-live-tables-triggered-pipelines-public-preview)\n+ [Jobs orchestration is now GA](https://docs.databricks.com/release-notes/product/2021/october.html#jobs-orchestration-is-now-ga)\n+ [Databricks Connector for Power BI](https://docs.databricks.com/release-notes/product/2021/october.html#databricks-connector-for-power-bi)\n+ [Repos now supports arbitrary file types](https://docs.databricks.com/release-notes/product/2021/october.html#repos-now-supports-arbitrary-file-types)\n+ [More detailed job run output with the Jobs API](https://docs.databricks.com/release-notes/product/2021/october.html#more-detailed-job-run-output-with-the-jobs-api)\n+ [Improved readability of notebook paths in the Jobs UI](https://docs.databricks.com/release-notes/product/2021/october.html#improved-readability-of-notebook-paths-in-the-jobs-ui)\n+ [Open your Delta Live Tables pipeline in a new tab or window](https://docs.databricks.com/release-notes/product/2021/october.html#open-your-delta-live-tables-pipeline-in-a-new-tab-or-window)\n+ [New escape sequence for `$` in legacy input widgets in SQL](https://docs.databricks.com/release-notes/product/2021/october.html#new-escape-sequence-for--in-legacy-input-widgets-in-sql)\n+ [Faster model deployment with automatically generated batch inference notebook](https://docs.databricks.com/release-notes/product/2021/october.html#faster-model-deployment-with-automatically-generated-batch-inference-notebook)\n* [September 2021](https://docs.databricks.com/release-notes/product/2021/september.html)\n+ [Databricks Runtime 10.0 (Beta)](https://docs.databricks.com/release-notes/product/2021/september.html#databricks-runtime-100-beta)\n+ [Databricks ODBC driver 2.6.18](https://docs.databricks.com/release-notes/product/2021/september.html#databricks-odbc-driver-2618)\n+ [Customer control of workspace login by Databricks staff](https://docs.databricks.com/release-notes/product/2021/september.html#customer-control-of-workspace-login-by-databricks-staff)\n+ [Databricks Runtime 9.1 LTS and 9.1 LTS ML are GA; 9.1 LTS Photon is Public Preview](https://docs.databricks.com/release-notes/product/2021/september.html#databricks-runtime-91-lts-and-91-lts-ml-are-ga-91-lts-photon-is-public-preview)\n+ [Databricks Runtime 8.1 series support ends](https://docs.databricks.com/release-notes/product/2021/september.html#databricks-runtime-81-series-support-ends)\n+ [Databricks JDBC driver 2.6.19](https://docs.databricks.com/release-notes/product/2021/september.html#databricks-jdbc-driver-2619)\n+ [Share feature tables across workspaces](https://docs.databricks.com/release-notes/product/2021/september.html#share-feature-tables-across-workspaces)\n+ [Security and usability improvements when resetting passwords](https://docs.databricks.com/release-notes/product/2021/september.html#security-and-usability-improvements-when-resetting-passwords)\n+ [Repos now supports `.gitignore`](https://docs.databricks.com/release-notes/product/2021/september.html#repos-now-supports-gitignore)\n+ [Enhanced jobs UI is now standard for all workspaces](https://docs.databricks.com/release-notes/product/2021/september.html#enhanced-jobs-ui-is-now-standard-for-all-workspaces)\n+ [Databricks now available in region `ap-southeast-1`](https://docs.databricks.com/release-notes/product/2021/september.html#databricks-now-available-in-region-ap-southeast-1)\n+ [PrivateLink supported in all availability zones within the supported regions](https://docs.databricks.com/release-notes/product/2021/september.html#privatelink-supported-in-all-availability-zones-within-the-supported-regions)\n+ [Databricks Runtime 9.1 (Beta)](https://docs.databricks.com/release-notes/product/2021/september.html#databricks-runtime-91-beta)\n+ [Streamlined management of settings for Databricks jobs](https://docs.databricks.com/release-notes/product/2021/september.html#streamlined-management-of-settings-for-databricks-jobs)\n+ [Databricks SQL Public Preview available in all workspaces](https://docs.databricks.com/release-notes/product/2021/september.html#databricks-sql-public-preview-available-in-all-workspaces)\n+ [Delete feature tables from Feature Store](https://docs.databricks.com/release-notes/product/2021/september.html#delete-feature-tables-from-feature-store)\n+ [Grant view pipeline permissions in the Delta Live Tables UI (Public Preview)](https://docs.databricks.com/release-notes/product/2021/september.html#grant-view-pipeline-permissions-in-the-delta-live-tables-ui-public-preview)\n+ [Reduce cluster resource usage with Delta Live Tables (Public Preview)](https://docs.databricks.com/release-notes/product/2021/september.html#reduce-cluster-resource-usage-with-delta-live-tables-public-preview)\n+ [Use MLflow models in your Delta Live Tables pipelines (Public Preview)](https://docs.databricks.com/release-notes/product/2021/september.html#use-mlflow-models-in-your-delta-live-tables-pipelines-public-preview)\n+ [Find Delta Live Tables pipelines by name (Public Preview)](https://docs.databricks.com/release-notes/product/2021/september.html#find-delta-live-tables-pipelines-by-name-public-preview)\n+ [PyTorch TorchScript and other third-party libraries are now supported in Databricks jobs](https://docs.databricks.com/release-notes/product/2021/september.html#pytorch-torchscript-and-other-third-party-libraries-are-now-supported-in-databricks-jobs)\n+ [Databricks Runtime 8.0 series support ends](https://docs.databricks.com/release-notes/product/2021/september.html#databricks-runtime-80-series-support-ends)\n* [August 2021](https://docs.databricks.com/release-notes/product/2021/august.html)\n+ [Databricks Repos GA](https://docs.databricks.com/release-notes/product/2021/august.html#databricks-repos-ga)\n+ [Serverless SQL provides instant compute, minimal management, and cost optimization for SQL queries (Public Preview)](https://docs.databricks.com/release-notes/product/2021/august.html#serverless-sql-provides-instant-compute-minimal-management-and-cost-optimization-for-sql-queries-public-preview)\n+ [User interface improvements for Delta Live Tables (Public Preview)](https://docs.databricks.com/release-notes/product/2021/august.html#user-interface-improvements-for-delta-live-tables-public-preview)\n+ [More control over how tables are materialized in Delta Live Tables pipelines (Public Preview)](https://docs.databricks.com/release-notes/product/2021/august.html#more-control-over-how-tables-are-materialized-in-delta-live-tables-pipelines-public-preview)\n+ [Increased timeout for long-running notebook jobs](https://docs.databricks.com/release-notes/product/2021/august.html#increased-timeout-for-long-running-notebook-jobs)\n+ [Jobs service stability and scalability improvements](https://docs.databricks.com/release-notes/product/2021/august.html#jobs-service-stability-and-scalability-improvements)\n+ [User entitlements granted by group membership are displayed in the admin console](https://docs.databricks.com/release-notes/product/2021/august.html#user-entitlements-granted-by-group-membership-are-displayed-in-the-admin-console)\n+ [Manage MLflow experiment permissions (Public Preview)](https://docs.databricks.com/release-notes/product/2021/august.html#manage-mlflow-experiment-permissions-public-preview)\n+ [Improved job creation from notebooks](https://docs.databricks.com/release-notes/product/2021/august.html#improved-job-creation-from-notebooks)\n+ [Improved support for collapsing notebook headings](https://docs.databricks.com/release-notes/product/2021/august.html#improved-support-for-collapsing-notebook-headings)\n+ [Databricks Runtime 9.0 and 9.0 ML are GA; 9.0 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2021/august.html#databricks-runtime-90-and-90-ml-are-ga-90-photon-is-public-preview)\n+ [Low-latency delivery of audit logs is generally available](https://docs.databricks.com/release-notes/product/2021/august.html#low-latency-delivery-of-audit-logs-is-generally-available)\n+ [Databricks Runtime 9.0 (Beta)](https://docs.databricks.com/release-notes/product/2021/august.html#databricks-runtime-90-beta)\n+ [Manage repos programmatically with the Databricks CLI (Public Preview)](https://docs.databricks.com/release-notes/product/2021/august.html#manage-repos-programmatically-with-the-databricks-cli-public-preview)\n+ [Manage repos programmatically with the Databricks REST API (Public Preview)](https://docs.databricks.com/release-notes/product/2021/august.html#manage-repos-programmatically-with-the-databricks-rest-api-public-preview)\n+ [Databricks Runtime 7.6 series support ends](https://docs.databricks.com/release-notes/product/2021/august.html#databricks-runtime-76-series-support-ends)\n+ [Log delivery APIs now report delivery status](https://docs.databricks.com/release-notes/product/2021/august.html#log-delivery-apis-now-report-delivery-status)\n+ [Use the AWS EBS SSD gp3 volume type for all clusters in a workspace](https://docs.databricks.com/release-notes/product/2021/august.html#use-the-aws-ebs-ssd-gp3-volume-type-for-all-clusters-in-a-workspace)\n+ [Audit events are logged when you interact with Databricks Repos](https://docs.databricks.com/release-notes/product/2021/august.html#audit-events-are-logged-when-you-interact-with-databricks-repos)\n+ [Improved job creation and management workflow](https://docs.databricks.com/release-notes/product/2021/august.html#improved-job-creation-and-management-workflow)\n+ [Simplified instructions for setting Git credentials (Public Preview)](https://docs.databricks.com/release-notes/product/2021/august.html#simplified-instructions-for-setting-git-credentials-public-preview)\n+ [Import multiple notebooks in `.html` format](https://docs.databricks.com/release-notes/product/2021/august.html#import-multiple-notebooks-in-html-format)\n+ [Usability improvements for Delta Live Tables](https://docs.databricks.com/release-notes/product/2021/august.html#usability-improvements-for-delta-live-tables)\n+ [Configure Databricks for SSO with Microsoft Entra ID in your Azure tenant](https://docs.databricks.com/release-notes/product/2021/august.html#configure-databricks-for-sso-with-microsoft-entra-id-in-your-azure-tenant)\n* [July 2021](https://docs.databricks.com/release-notes/product/2021/july.html)\n+ [Manage MLflow experiment permissions with the Databricks REST API](https://docs.databricks.com/release-notes/product/2021/july.html#manage-mlflow-experiment-permissions-with-the-databricks-rest-api)\n+ [Databricks web interface is localized in Portuguese and French (Public Preview)](https://docs.databricks.com/release-notes/product/2021/july.html#databricks-web-interface-is-localized-in-portuguese-and-french-public-preview)\n+ [Databricks Runtime 5.5 LTS for Machine Learning support ends, replaced by Extended Support version](https://docs.databricks.com/release-notes/product/2021/july.html#databricks-runtime-55-lts-for-machine-learning-support-ends-replaced-by-extended-support-version)\n+ [Databricks Light 2.4 support ends September 5, replaced by Extended Support version](https://docs.databricks.com/release-notes/product/2021/july.html#databricks-light-24-support-ends-september-5-replaced-by-extended-support-version)\n+ [Reduced permissions for cross-account IAM roles](https://docs.databricks.com/release-notes/product/2021/july.html#reduced-permissions-for-cross-account-iam-roles)\n+ [Feature freshness information available in Databricks Feature Store UI (Public Preview)](https://docs.databricks.com/release-notes/product/2021/july.html#feature-freshness-information-available-in-databricks-feature-store-ui-public-preview)\n+ [Display up to 10,000 result rows](https://docs.databricks.com/release-notes/product/2021/july.html#display-up-to-10000-result-rows)\n+ [Bulk import and export notebooks in a folder as source files](https://docs.databricks.com/release-notes/product/2021/july.html#bulk-import-and-export-notebooks-in-a-folder-as-source-files)\n+ [Autocomplete in SQL notebooks now uses all-caps for SQL keywords](https://docs.databricks.com/release-notes/product/2021/july.html#autocomplete-in-sql-notebooks-now-uses-all-caps-for-sql-keywords)\n+ [Reorderable and resizable widgets in notebooks](https://docs.databricks.com/release-notes/product/2021/july.html#reorderable-and-resizable-widgets-in-notebooks)\n+ [Databricks UI usability fixes](https://docs.databricks.com/release-notes/product/2021/july.html#databricks-ui-usability-fixes)\n+ [Quickly define pipeline settings when you create a new Delta Live Tables pipeline](https://docs.databricks.com/release-notes/product/2021/july.html#quickly-define-pipeline-settings-when-you-create-a-new-delta-live-tables-pipeline)\n+ [Databricks Runtime 8.4 and 8.4 ML are GA; 8.4 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2021/july.html#databricks-runtime-84-and-84-ml-are-ga-84-photon-is-public-preview)\n+ [Use Spark SQL with the Delta Live Tables Python API](https://docs.databricks.com/release-notes/product/2021/july.html#use-spark-sql-with-the-delta-live-tables-python-api)\n+ [Enhanced data processing and analysis with Databricks jobs (Public Preview)](https://docs.databricks.com/release-notes/product/2021/july.html#enhanced-data-processing-and-analysis-with-databricks-jobs-public-preview)\n+ [Reduced cost for Delta Live Tables default clusters (Public Preview)](https://docs.databricks.com/release-notes/product/2021/july.html#reduced-cost-for-delta-live-tables-default-clusters-public-preview)\n+ [Sort pipelines by name in the Delta Live Tables UI (Public Preview)](https://docs.databricks.com/release-notes/product/2021/july.html#sort-pipelines-by-name-in-the-delta-live-tables-ui-public-preview)\n+ [Changes to Compute page](https://docs.databricks.com/release-notes/product/2021/july.html#changes-to-compute-page)\n+ [Databricks Runtime 5.5 LTS support ends, replaced by Databricks Runtime 5.5 Extended Support through the end of 2021](https://docs.databricks.com/release-notes/product/2021/july.html#databricks-runtime-55-lts-support-ends-replaced-by-databricks-runtime-55-extended-support-through-the-end-of-2021)\n+ [Repos API (Public Preview)](https://docs.databricks.com/release-notes/product/2021/july.html#repos-api-public-preview)\n+ [Databricks Runtime 8.4 (Beta)](https://docs.databricks.com/release-notes/product/2021/july.html#databricks-runtime-84-beta)\n* [June 2021](https://docs.databricks.com/release-notes/product/2021/june.html)\n+ [Correction: Repos for Git is enabled by default in new and existing workspaces in some regions](https://docs.databricks.com/release-notes/product/2021/june.html#correction-repos-for-git-is-enabled-by-default-in-new-and-existing-workspaces-in-some-regions)\n+ [Change to Feature Store permissions](https://docs.databricks.com/release-notes/product/2021/june.html#change-to-feature-store-permissions)\n+ [Improved access to results in the MLflow runs table](https://docs.databricks.com/release-notes/product/2021/june.html#improved-access-to-results-in-the-mlflow-runs-table)\n+ [Better cost visibility for Delta Live Tables](https://docs.databricks.com/release-notes/product/2021/june.html#better-cost-visibility-for-delta-live-tables)\n+ [Enhanced data quality constraints for Delta Live Tables](https://docs.databricks.com/release-notes/product/2021/june.html#enhanced-data-quality-constraints-for-delta-live-tables)\n+ [API changes for updating and replacing IP address lists](https://docs.databricks.com/release-notes/product/2021/june.html#api-changes-for-updating-and-replacing-ip-address-lists)\n+ [Databricks ODBC driver 2.6.17](https://docs.databricks.com/release-notes/product/2021/june.html#databricks-odbc-driver-2617)\n+ [Use an API to download usage data directly](https://docs.databricks.com/release-notes/product/2021/june.html#use-an-api-to-download-usage-data-directly)\n+ [Databricks Runtime 7.5 series support ends](https://docs.databricks.com/release-notes/product/2021/june.html#databricks-runtime-75-series-support-ends)\n+ [Optimize performance and control costs by using different pools for the driver node and worker nodes](https://docs.databricks.com/release-notes/product/2021/june.html#optimize-performance-and-control-costs-by-using-different-pools-for-the-driver-node-and-worker-nodes)\n+ [Photon runtimes now support `i3.xlarge` instances (Public Preview)](https://docs.databricks.com/release-notes/product/2021/june.html#photon-runtimes-now-support-i3xlarge-instances-public-preview)\n+ [Registry-wide permissions for Model Registry](https://docs.databricks.com/release-notes/product/2021/june.html#registry-wide-permissions-for-model-registry)\n+ [A user\u2019s home directory is no longer protected when you delete a user using the SCIM API](https://docs.databricks.com/release-notes/product/2021/june.html#a-users-home-directory-is-no-longer-protected-when-you-delete-a-user-using-the-scim-api)\n+ [Accelerate SQL workloads with Photon (Public Preview)](https://docs.databricks.com/release-notes/product/2021/june.html#accelerate-sql-workloads-with-photon-public-preview)\n+ [Databricks Runtime 8.3 and 8.3 ML are GA; 8.3 Photon is Public Preview](https://docs.databricks.com/release-notes/product/2021/june.html#databricks-runtime-83-and-83-ml-are-ga-83-photon-is-public-preview)\n+ [Python and SQL table access control (GA)](https://docs.databricks.com/release-notes/product/2021/june.html#python-and-sql-table-access-control-ga)\n+ [Jobs UI and API now show the owner of a job run](https://docs.databricks.com/release-notes/product/2021/june.html#jobs-ui-and-api-now-show-the-owner-of-a-job-run)\n+ [Protect sensitive Spark configuration properties and environment variables using secrets (Public Preview)](https://docs.databricks.com/release-notes/product/2021/june.html#protect-sensitive-spark-configuration-properties-and-environment-variables-using-secrets-public-preview)\n+ [Repos for Git is enabled by default in new and existing workspaces in some regions](https://docs.databricks.com/release-notes/product/2021/june.html#repos-for-git-is-enabled-by-default-in-new-and-existing-workspaces-in-some-regions)\n+ [Redesigned Workspace Settings UI](https://docs.databricks.com/release-notes/product/2021/june.html#redesigned-workspace-settings-ui)\n+ [Updates to `ListTokens` and `ListAllTokens` database queries expired tokens](https://docs.databricks.com/release-notes/product/2021/june.html#updates-to-listtokens-and-listalltokens-database-queries-expired-tokens)\n+ [Confirmation now required when granting or revoking Admin permissions](https://docs.databricks.com/release-notes/product/2021/june.html#confirmation-now-required-when-granting-or-revoking-admin-permissions)\n+ [Changes to keyboard shortcuts in the web UI](https://docs.databricks.com/release-notes/product/2021/june.html#changes-to-keyboard-shortcuts-in-the-web-ui)\n* [May 2021](https://docs.databricks.com/release-notes/product/2021/may.html)\n+ [Databricks Machine Learning: a data-native and collaborative solution for the full ML lifecycle](https://docs.databricks.com/release-notes/product/2021/may.html#databricks-machine-learning-a-data-native-and-collaborative-solution-for-the-full-ml-lifecycle)\n+ [SQL Analytics is renamed to Databricks SQL](https://docs.databricks.com/release-notes/product/2021/may.html#sql-analytics-is-renamed-to-databricks-sql)\n+ [Create and manage ETL pipelines using Delta Live Tables (Public Preview)](https://docs.databricks.com/release-notes/product/2021/may.html#create-and-manage-etl-pipelines-using-delta-live-tables-public-preview)\n+ [Reduced scope of required egress rules for customer-managed VPCs](https://docs.databricks.com/release-notes/product/2021/may.html#reduced-scope-of-required-egress-rules-for-customer-managed-vpcs)\n+ [Workspaces in the `eu-west-2` region now support AWS PrivateLink](https://docs.databricks.com/release-notes/product/2021/may.html#workspaces-in-the-eu-west-2-region-now-support-aws-privatelink)\n+ [Encrypt Databricks SQL queries and query history using your own key (Public Preview)](https://docs.databricks.com/release-notes/product/2021/may.html#encrypt-databricks-sql-queries-and-query-history-using-your-own-key-public-preview)\n+ [Increased limit for the number of terminated all-purpose clusters](https://docs.databricks.com/release-notes/product/2021/may.html#increased-limit-for-the-number-of-terminated-all-purpose-clusters)\n+ [Increased limit for the number of pinned clusters](https://docs.databricks.com/release-notes/product/2021/may.html#increased-limit-for-the-number-of-pinned-clusters)\n+ [Manage where notebook results are stored (Public Preview)](https://docs.databricks.com/release-notes/product/2021/may.html#manage-where-notebook-results-are-stored-public-preview)\n+ [The new improved account console is GA](https://docs.databricks.com/release-notes/product/2021/may.html#the-new-improved-account-console-is-ga)\n+ [Customer-managed keys for workspace storage (Public Preview)](https://docs.databricks.com/release-notes/product/2021/may.html#customer-managed-keys-for-workspace-storage-public-preview)\n+ [Changes to the Account API for customer-managed keys](https://docs.databricks.com/release-notes/product/2021/may.html#changes-to-the-account-api-for-customer-managed-keys)\n+ [Google Cloud Storage connector (GA)](https://docs.databricks.com/release-notes/product/2021/may.html#google-cloud-storage-connector-ga)\n+ [Databricks Runtime 7.4 series support ends](https://docs.databricks.com/release-notes/product/2021/may.html#databricks-runtime-74-series-support-ends)\n+ [Better governance with enhanced audit logging](https://docs.databricks.com/release-notes/product/2021/may.html#better-governance-with-enhanced-audit-logging)\n+ [Use SSO to authenticate to the account console (Public Preview)](https://docs.databricks.com/release-notes/product/2021/may.html#use-sso-to-authenticate-to-the-account-console-public-preview)\n+ [Repos users can now integrate with Azure DevOps using personal access tokens](https://docs.databricks.com/release-notes/product/2021/may.html#repos-users-can-now-integrate-with-azure-devops-using-personal-access-tokens)\n+ [Jobs service stability and scalability improvements (Public Preview)](https://docs.databricks.com/release-notes/product/2021/may.html#jobs-service-stability-and-scalability-improvements-public-preview)\n+ [Service principals provide API-only access to Databricks resources (Public Preview)](https://docs.databricks.com/release-notes/product/2021/may.html#service-principals-provide-api-only-access-to-databricks-resources-public-preview)\n* [April 2021](https://docs.databricks.com/release-notes/product/2021/april.html)\n+ [Databricks Runtime 8.2 (GA)](https://docs.databricks.com/release-notes/product/2021/april.html#databricks-runtime-82-ga)\n+ [AWS PrivateLink for Databricks workspaces (Public Preview)](https://docs.databricks.com/release-notes/product/2021/april.html#aws-privatelink-for-databricks-workspaces-public-preview)\n+ [Update running workspaces with new credentials or network configurations](https://docs.databricks.com/release-notes/product/2021/april.html#update-running-workspaces-with-new-credentials-or-network-configurations)\n+ [Databricks can now send in-product messages and product tours directly to your workspace (Public Preview)](https://docs.databricks.com/release-notes/product/2021/april.html#databricks-can-now-send-in-product-messages-and-product-tours-directly-to-your-workspace-public-preview)\n+ [Easier job management with the enhanced jobs user interface](https://docs.databricks.com/release-notes/product/2021/april.html#easier-job-management-with-the-enhanced-jobs-user-interface)\n+ [Cluster policy changes are applied automatically to existing clusters at restart and edit](https://docs.databricks.com/release-notes/product/2021/april.html#cluster-policy-changes-are-applied-automatically-to-existing-clusters-at-restart-and-edit)\n+ [Track retries in your job tasks when task attempts fail](https://docs.databricks.com/release-notes/product/2021/april.html#track-retries-in-your-job-tasks-when-task-attempts-fail)\n+ [Quickly view cluster details when you create a new cluster](https://docs.databricks.com/release-notes/product/2021/april.html#quickly-view-cluster-details-when-you-create-a-new-cluster)\n+ [MLflow sidebar reflects the most recent experiment](https://docs.databricks.com/release-notes/product/2021/april.html#mlflow-sidebar-reflects-the-most-recent-experiment)\n+ [Change to default channel for `conda.yaml` files in MLflow](https://docs.databricks.com/release-notes/product/2021/april.html#change-to-default-channel-for-condayaml-files-in-mlflow)\n+ [New free trial and pay-as-you-go customers are now on the E2 version of the platform](https://docs.databricks.com/release-notes/product/2021/april.html#new-free-trial-and-pay-as-you-go-customers-are-now-on-the-e2-version-of-the-platform)\n+ [Databricks Runtime 8.2 (Beta)](https://docs.databricks.com/release-notes/product/2021/april.html#databricks-runtime-82-beta)\n+ [User and group limits](https://docs.databricks.com/release-notes/product/2021/april.html#user-and-group-limits)\n+ [Easier monitoring of job run status](https://docs.databricks.com/release-notes/product/2021/april.html#easier-monitoring-of-job-run-status)\n+ [Better governance with enhanced audit logging](https://docs.databricks.com/release-notes/product/2021/april.html#better-governance-with-enhanced-audit-logging)\n+ [Global init scripts no longer run on model serving clusters](https://docs.databricks.com/release-notes/product/2021/april.html#global-init-scripts-no-longer-run-on-model-serving-clusters)\n+ [Databricks Runtime 6.4 series support ends](https://docs.databricks.com/release-notes/product/2021/april.html#databricks-runtime-64-series-support-ends)\n* [March 2021](https://docs.databricks.com/release-notes/product/2021/march.html)\n+ [Databricks now supports dark mode for viewing notebooks](https://docs.databricks.com/release-notes/product/2021/march.html#databricks-now-supports-dark-mode-for-viewing-notebooks)\n+ [Databricks Runtime 8.1 (GA)](https://docs.databricks.com/release-notes/product/2021/march.html#databricks-runtime-81-ga)\n+ [Easier job creation and management with the enhanced jobs user interface (Public Preview)](https://docs.databricks.com/release-notes/product/2021/march.html#easier-job-creation-and-management-with-the-enhanced-jobs-user-interface-public-preview)\n+ [Track job retry attempts with a new sequential value returned for each job run attempt](https://docs.databricks.com/release-notes/product/2021/march.html#track-job-retry-attempts-with-a-new-sequential-value-returned-for-each-job-run-attempt)\n+ [Increased limit for the number of saved jobs in Premium and Enterprise workspaces](https://docs.databricks.com/release-notes/product/2021/march.html#increased-limit-for-the-number-of-saved-jobs-in-premium-and-enterprise-workspaces)\n+ [Easier way to connect to Databricks from your favorite BI tools and SQL clients](https://docs.databricks.com/release-notes/product/2021/march.html#easier-way-to-connect-to-databricks-from-your-favorite-bi-tools-and-sql-clients)\n+ [Databricks Repos let you use Git repositories to integrate Databricks with CI/CD systems](https://docs.databricks.com/release-notes/product/2021/march.html#databricks-repos-let-you-use-git-repositories-to-integrate-databricks-with-cicd-systems)\n+ [Automatic retries for failed job clusters reverted](https://docs.databricks.com/release-notes/product/2021/march.html#automatic-retries-for-failed-job-clusters-reverted)\n+ [Databricks Runtime 8.1 (Beta)](https://docs.databricks.com/release-notes/product/2021/march.html#databricks-runtime-81-beta)\n+ [Limit username and password authentication with password ACLs (GA)](https://docs.databricks.com/release-notes/product/2021/march.html#limit-username-and-password-authentication-with-password-acls-ga)\n+ [Receive email notification about activity in Model Registry](https://docs.databricks.com/release-notes/product/2021/march.html#receive-email-notification-about-activity-in-model-registry)\n+ [Model Serving now supports additional model types](https://docs.databricks.com/release-notes/product/2021/march.html#model-serving-now-supports-additional-model-types)\n+ [New options for searching Model Registry](https://docs.databricks.com/release-notes/product/2021/march.html#new-options-for-searching-model-registry)\n+ [Increased limit for the number of terminated all-purpose clusters](https://docs.databricks.com/release-notes/product/2021/march.html#increased-limit-for-the-number-of-terminated-all-purpose-clusters)\n+ [Increased limit for the number of pinned clusters in a workspace](https://docs.databricks.com/release-notes/product/2021/march.html#increased-limit-for-the-number-of-pinned-clusters-in-a-workspace)\n+ [Databricks Runtime 8.0 (GA)](https://docs.databricks.com/release-notes/product/2021/march.html#databricks-runtime-80-ga)\n* [February 2021](https://docs.databricks.com/release-notes/product/2021/february.html)\n+ [New Databricks Power BI connector (GA)](https://docs.databricks.com/release-notes/product/2021/february.html#new-databricks-power-bi-connector-ga)\n+ [Add and manage account admins using the SCIM API (E2 accounts, Public Preview)](https://docs.databricks.com/release-notes/product/2021/february.html#add-and-manage-account-admins-using-the-scim-api-e2-accounts-public-preview)\n+ [Added modification\\_time to the DBFS REST API get-status and list responses](https://docs.databricks.com/release-notes/product/2021/february.html#added-modification_time-to-the-dbfs-rest-api-get-status-and-list-responses)\n+ [Easily copy long experiment names in MLflow](https://docs.databricks.com/release-notes/product/2021/february.html#easily-copy-long-experiment-names-in-mlflow)\n+ [Adjust memory size and number of cores for serving clusters](https://docs.databricks.com/release-notes/product/2021/february.html#adjust-memory-size-and-number-of-cores-for-serving-clusters)\n+ [Web terminal is now GA](https://docs.databricks.com/release-notes/product/2021/february.html#web-terminal-is-now-ga)\n+ [Auto-AZ: automatic selection of availability zone (AZ) when you launch clusters, available on all deployment types](https://docs.databricks.com/release-notes/product/2021/february.html#auto-az-automatic-selection-of-availability-zone-az-when-you-launch-clusters-available-on-all-deployment-types)\n+ [Delegate account management beyond the account owner (E2 accounts, Public Preview)](https://docs.databricks.com/release-notes/product/2021/february.html#delegate-account-management-beyond-the-account-owner-e2-accounts-public-preview)\n+ [Separate account-level and workspace-level audit logging configurations help you monitor account and workspace activity more effectively](https://docs.databricks.com/release-notes/product/2021/february.html#separate-account-level-and-workspace-level-audit-logging-configurations-help-you-monitor-account-and-workspace-activity-more-effectively)\n+ [Databricks Runtime 7.2 series support ends](https://docs.databricks.com/release-notes/product/2021/february.html#databricks-runtime-72-series-support-ends)\n+ [Databricks Runtime 7.6 GA](https://docs.databricks.com/release-notes/product/2021/february.html#databricks-runtime-76-ga)\n+ [Databricks Runtime 8.0 (Beta)](https://docs.databricks.com/release-notes/product/2021/february.html#databricks-runtime-80-beta)\n+ [Databricks Runtime for Genomics now deprecated](https://docs.databricks.com/release-notes/product/2021/february.html#databricks-runtime-for-genomics-now-deprecated)\n+ [View more readable JSON in the MLflow run artifact display](https://docs.databricks.com/release-notes/product/2021/february.html#view-more-readable-json-in-the-mlflow-run-artifact-display)\n+ [Provide comments in the Model Registry using REST API](https://docs.databricks.com/release-notes/product/2021/february.html#provide-comments-in-the-model-registry-using-rest-api)\n+ [Easily specify default cluster values in API calls](https://docs.databricks.com/release-notes/product/2021/february.html#easily-specify-default-cluster-values-in-api-calls)\n+ [Tune cluster worker configuration according to current worker allocation](https://docs.databricks.com/release-notes/product/2021/february.html#tune-cluster-worker-configuration-according-to-current-worker-allocation)\n+ [Pass context specific information to a job\u2019s task with task parameter variables](https://docs.databricks.com/release-notes/product/2021/february.html#pass-context-specific-information-to-a-jobs-task-with-task-parameter-variables)\n+ [Error messages from job failures no longer contain possibly sensitive information](https://docs.databricks.com/release-notes/product/2021/february.html#error-messages-from-job-failures-no-longer-contain-possibly-sensitive-information)\n+ [Download usage data from the new account console for E2 accounts](https://docs.databricks.com/release-notes/product/2021/february.html#download-usage-data-from-the-new-account-console-for-e2-accounts)\n* [January 2021](https://docs.databricks.com/release-notes/product/2021/january.html)\n+ [Databricks Runtime 7.1 series support ends](https://docs.databricks.com/release-notes/product/2021/january.html#databricks-runtime-71-series-support-ends)\n+ [Start clusters faster with Docker images preloaded into instance pools](https://docs.databricks.com/release-notes/product/2021/january.html#start-clusters-faster-with-docker-images-preloaded-into-instance-pools)\n+ [Notebook find and replace now supports changing all occurrences of a match](https://docs.databricks.com/release-notes/product/2021/january.html#notebook-find-and-replace-now-supports-changing-all-occurrences-of-a-match)\n+ [Single Node clusters (GA)](https://docs.databricks.com/release-notes/product/2021/january.html#single-node-clusters-ga)\n+ [Free form cluster policy type renamed to Unrestricted](https://docs.databricks.com/release-notes/product/2021/january.html#free-form-cluster-policy-type-renamed-to-unrestricted)\n+ [Cluster policy field not shown if a user only has access to one policy](https://docs.databricks.com/release-notes/product/2021/january.html#cluster-policy-field-not-shown-if-a-user-only-has-access-to-one-policy)\n+ [G4 family of GPU-accelerated EC2 instances GA](https://docs.databricks.com/release-notes/product/2021/january.html#g4-family-of-gpu-accelerated-ec2-instances-ga)\n+ [Databricks Runtime 7.0 series support ends](https://docs.databricks.com/release-notes/product/2021/january.html#databricks-runtime-70-series-support-ends)\n+ [Billable usage and audit log S3 bucket policy and object ACL changes](https://docs.databricks.com/release-notes/product/2021/january.html#billable-usage-and-audit-log-s3-bucket-policy-and-object-acl-changes)\n+ [E2 platform comes to the Asia Pacific region](https://docs.databricks.com/release-notes/product/2021/january.html#e2-platform-comes-to-the-asia-pacific-region)\n* [December 2020](https://docs.databricks.com/release-notes/product/2020/december.html)\n+ [Databricks Runtime 7.5 GA](https://docs.databricks.com/release-notes/product/2020/december.html#databricks-runtime-75-ga)\n+ [Existing Databricks accounts migrate to E2 platform today](https://docs.databricks.com/release-notes/product/2020/december.html#existing-databricks-accounts-migrate-to-e2-platform-today)\n+ [Jobs API now supports updating existing jobs](https://docs.databricks.com/release-notes/product/2020/december.html#jobs-api-now-supports-updating-existing-jobs)\n+ [New global init script framework is GA](https://docs.databricks.com/release-notes/product/2020/december.html#new-global-init-script-framework-is-ga)\n+ [New account console enables customers on the E2 platform to create and manage multiple workspaces (Public Preview)](https://docs.databricks.com/release-notes/product/2020/december.html#new-account-console-enables-customers-on-the-e2-platform-to-create-and-manage-multiple-workspaces-public-preview)\n+ [Databricks Runtime 7.5 (Beta)](https://docs.databricks.com/release-notes/product/2020/december.html#databricks-runtime-75-beta)\n+ [Auto-AZ: automatic selection of availability zone (AZ) when you launch clusters](https://docs.databricks.com/release-notes/product/2020/december.html#auto-az-automatic-selection-of-availability-zone-az-when-you-launch-clusters)\n+ [Jobs API end\\_time field now uses epoch time](https://docs.databricks.com/release-notes/product/2020/december.html#jobs-api-end_time-field-now-uses-epoch-time)\n+ [Find DBFS files using new visual browser](https://docs.databricks.com/release-notes/product/2020/december.html#find-dbfs-files-using-new-visual-browser)\n+ [Visibility controls for jobs, clusters, notebooks, and other workspace objects are now enabled by default on new workspaces](https://docs.databricks.com/release-notes/product/2020/december.html#visibility-controls-for-jobs-clusters-notebooks-and-other-workspace-objects-are-now-enabled-by-default-on-new-workspaces)\n+ [Improved display of nested runs in MLflow](https://docs.databricks.com/release-notes/product/2020/december.html#improved-display-of-nested-runs-in-mlflow)\n+ [Admins can now lock user accounts (Public Preview)](https://docs.databricks.com/release-notes/product/2020/december.html#admins-can-now-lock-user-accounts-public-preview)\n+ [Updated NVIDIA driver](https://docs.databricks.com/release-notes/product/2020/december.html#updated-nvidia-driver)\n+ [Use your own keys to secure notebooks (Public Preview)](https://docs.databricks.com/release-notes/product/2020/december.html#use-your-own-keys-to-secure-notebooks-public-preview)\n* [November 2020](https://docs.databricks.com/release-notes/product/2020/november.html)\n+ [Databricks Runtime 6.6 series support ends](https://docs.databricks.com/release-notes/product/2020/november.html#databricks-runtime-66-series-support-ends)\n+ [MLflow Model Registry GA](https://docs.databricks.com/release-notes/product/2020/november.html#mlflow-model-registry-ga)\n+ [Filter experiment runs based on whether a registered model is associated](https://docs.databricks.com/release-notes/product/2020/november.html#filter-experiment-runs-based-on-whether-a-registered-model-is-associated)\n+ [Partner integrations gallery now available through the Data tab](https://docs.databricks.com/release-notes/product/2020/november.html#partner-integrations-gallery-now-available-through-the-data-tab)\n+ [Cluster policies now use *allowlist* and *blocklist* as policy type names](https://docs.databricks.com/release-notes/product/2020/november.html#cluster-policies-now-use-allowlist-and-blocklist-as-policy-type-names)\n+ [Automatic retries when the creation of a job cluster fails](https://docs.databricks.com/release-notes/product/2020/november.html#automatic-retries-when-the-creation-of-a-job-cluster-fails)\n+ [Navigate notebooks using the table of contents](https://docs.databricks.com/release-notes/product/2020/november.html#navigate-notebooks-using-the-table-of-contents)\n+ [Databricks SQL (Public Preview)](https://docs.databricks.com/release-notes/product/2020/november.html#databricks-sql-public-preview)\n+ [Web terminal available on Databricks Community Edition](https://docs.databricks.com/release-notes/product/2020/november.html#web-terminal-available-on-databricks-community-edition)\n+ [Single Node clusters now support Databricks Container Services](https://docs.databricks.com/release-notes/product/2020/november.html#single-node-clusters-now-support-databricks-container-services)\n+ [Databricks Runtime 7.4 GA](https://docs.databricks.com/release-notes/product/2020/november.html#databricks-runtime-74-ga)\n+ [Databricks JDBC driver update](https://docs.databricks.com/release-notes/product/2020/november.html#databricks-jdbc-driver-update)\n+ [Databricks Connect 7.3 (Beta)](https://docs.databricks.com/release-notes/product/2020/november.html#databricks-connect-73-beta)\n* [October 2020](https://docs.databricks.com/release-notes/product/2020/october.html)\n+ [New Databricks Power BI connector available in the online Power BI service (Public Preview)](https://docs.databricks.com/release-notes/product/2020/october.html#new-databricks-power-bi-connector-available-in-the-online-power-bi-service-public-preview)\n+ [Databricks Runtime 7.4 (Beta)](https://docs.databricks.com/release-notes/product/2020/october.html#databricks-runtime-74-beta)\n+ [Expanded experiment access control (ACLs)](https://docs.databricks.com/release-notes/product/2020/october.html#expanded-experiment-access-control-acls)\n+ [High fidelity import and export of Jupyter notebook (ipynb) files](https://docs.databricks.com/release-notes/product/2020/october.html#high-fidelity-import-and-export-of-jupyter-notebook-ipynb-files)\n+ [SCIM API improvement: both indirect and direct groups returned in user record response](https://docs.databricks.com/release-notes/product/2020/october.html#scim-api-improvement-both-indirect-and-direct-groups-returned-in-user-record-response)\n+ [Databricks Runtime 6.5 series support ends](https://docs.databricks.com/release-notes/product/2020/october.html#databricks-runtime-65-series-support-ends)\n+ [Self-service, low-latency audit log configuration (Public Preview)](https://docs.databricks.com/release-notes/product/2020/october.html#self-service-low-latency-audit-log-configuration-public-preview)\n+ [SCIM API improvement: `$ref` field response](https://docs.databricks.com/release-notes/product/2020/october.html#scim-api-improvement-ref-field-response)\n+ [Databricks Runtime 7.3, 7.3 ML, and 7.3 Genomics declared Long Term Support (LTS)](https://docs.databricks.com/release-notes/product/2020/october.html#databricks-runtime-73-73-ml-and-73-genomics-declared-long-term-support-lts)\n+ [Render images at higher resolution using matplotlib](https://docs.databricks.com/release-notes/product/2020/october.html#render-images-at-higher-resolution-using-matplotlib)\n* [September 2020](https://docs.databricks.com/release-notes/product/2020/september.html)\n+ [Databricks Runtime 7.3, 7.3 ML, and 7.3 Genomics are now GA](https://docs.databricks.com/release-notes/product/2020/september.html#databricks-runtime-73-73-ml-and-73-genomics-are-now-ga)\n+ [Debugging hints for SAML credential passthrough misconfigurations](https://docs.databricks.com/release-notes/product/2020/september.html#debugging-hints-for-saml-credential-passthrough-misconfigurations)\n+ [Single Node clusters (Public Preview)](https://docs.databricks.com/release-notes/product/2020/september.html#single-node-clusters-public-preview)\n+ [DBFS REST API rate limiting](https://docs.databricks.com/release-notes/product/2020/september.html#dbfs-rest-api-rate-limiting)\n+ [New sidebar icons](https://docs.databricks.com/release-notes/product/2020/september.html#new-sidebar-icons)\n+ [Running jobs limit increase](https://docs.databricks.com/release-notes/product/2020/september.html#running-jobs-limit-increase)\n+ [Artifact access control lists (ACLs) in MLflow](https://docs.databricks.com/release-notes/product/2020/september.html#artifact-access-control-lists-acls-in-mlflow)\n+ [MLflow usability improvements](https://docs.databricks.com/release-notes/product/2020/september.html#mlflow-usability-improvements)\n+ [New Databricks Power BI connector (Public Preview)](https://docs.databricks.com/release-notes/product/2020/september.html#new-databricks-power-bi-connector-public-preview)\n+ [New JDBC and ODBC drivers bring faster and lower latency BI](https://docs.databricks.com/release-notes/product/2020/september.html#new-jdbc-and-odbc-drivers-bring-faster-and-lower-latency-bi)\n+ [MLflow Model Serving (Public Preview)](https://docs.databricks.com/release-notes/product/2020/september.html#mlflow-model-serving-public-preview)\n+ [Clusters UI improvements](https://docs.databricks.com/release-notes/product/2020/september.html#clusters-ui-improvements)\n+ [Visibility controls for jobs, clusters, notebooks, and other workspace objects](https://docs.databricks.com/release-notes/product/2020/september.html#visibility-controls-for-jobs-clusters-notebooks-and-other-workspace-objects)\n+ [Ability to create tokens no longer permitted by default](https://docs.databricks.com/release-notes/product/2020/september.html#ability-to-create-tokens-no-longer-permitted-by-default)\n+ [Support for c5.24xlarge instances](https://docs.databricks.com/release-notes/product/2020/september.html#support-for-c524xlarge-instances)\n+ [MLflow Model Registry supports sharing of models across workspaces](https://docs.databricks.com/release-notes/product/2020/september.html#mlflow-model-registry-supports-sharing-of-models-across-workspaces)\n+ [Databricks Runtime 7.3 (Beta)](https://docs.databricks.com/release-notes/product/2020/september.html#databricks-runtime-73-beta)\n+ [E2 architecture\u2014now GA\u2014provides better security, scalability, and management tools](https://docs.databricks.com/release-notes/product/2020/september.html#e2-architecture---now-ga---provides-better-security-scalability-and-management-tools)\n+ [Account API is generally available on the E2 version of the platform](https://docs.databricks.com/release-notes/product/2020/september.html#account-api-is-generally-available-on-the-e2-version-of-the-platform)\n+ [Secure cluster connectivity (no public IPs) is now the default on the E2 version of the platform](https://docs.databricks.com/release-notes/product/2020/september.html#secure-cluster-connectivity-no-public-ips-is-now-the-default-on-the-e2-version-of-the-platform)\n* [August 2020](https://docs.databricks.com/release-notes/product/2020/august.html)\n+ [Token Management API is GA and admins can use the Admin Console to grant and revoke user access to tokens](https://docs.databricks.com/release-notes/product/2020/august.html#token-management-api-is-ga-and-admins-can-use-the-admin-console-to-grant-and-revoke-user-access-to-tokens)\n+ [Message size limits for Shiny apps increased](https://docs.databricks.com/release-notes/product/2020/august.html#message-size-limits-for-shiny-apps-increased)\n+ [Improved instructions for setting up a cluster in local mode](https://docs.databricks.com/release-notes/product/2020/august.html#improved-instructions-for-setting-up-a-cluster-in-local-mode)\n+ [View version of notebook associated with a run](https://docs.databricks.com/release-notes/product/2020/august.html#view-version-of-notebook-associated-with-a-run)\n+ [Databricks Runtime 7.2 GA](https://docs.databricks.com/release-notes/product/2020/august.html#databricks-runtime-72-ga)\n+ [Databricks Runtime 7.2 ML GA](https://docs.databricks.com/release-notes/product/2020/august.html#databricks-runtime-72-ml-ga)\n+ [Databricks Runtime 7.2 Genomics GA](https://docs.databricks.com/release-notes/product/2020/august.html#databricks-runtime-72-genomics-ga)\n+ [Permissions API (Public Preview)](https://docs.databricks.com/release-notes/product/2020/august.html#permissions-api-public-preview)\n+ [Databricks Connect 7.1 (GA)](https://docs.databricks.com/release-notes/product/2020/august.html#databricks-connect-71-ga)\n+ [Repeatable installation order for cluster libraries](https://docs.databricks.com/release-notes/product/2020/august.html#repeatable-installation-order-for-cluster-libraries)\n+ [Customer-managed VPC is GA](https://docs.databricks.com/release-notes/product/2020/august.html#customer-managed-vpc-is-ga)\n+ [Secure cluster connectivity (no public IPs) is GA](https://docs.databricks.com/release-notes/product/2020/august.html#secure-cluster-connectivity-no-public-ips-is-ga)\n+ [Multi-workspace API (Account API) adds pricing tier](https://docs.databricks.com/release-notes/product/2020/august.html#multi-workspace-api-account-api-adds-pricing-tier)\n+ [Create model from MLflow registered models page (Public Preview)](https://docs.databricks.com/release-notes/product/2020/august.html#create-model-from-mlflow-registered-models-page-public-preview)\n+ [Databricks Container Services supports GPU images](https://docs.databricks.com/release-notes/product/2020/august.html#databricks-container-services-supports-gpu-images)\n* [July 2020](https://docs.databricks.com/release-notes/product/2020/july.html)\n+ [Web terminal (Public Preview)](https://docs.databricks.com/release-notes/product/2020/july.html#web-terminal-public-preview)\n+ [New, more secure global init script framework (Public Preview)](https://docs.databricks.com/release-notes/product/2020/july.html#new-more-secure-global-init-script-framework-public-preview)\n+ [IP access lists now GA](https://docs.databricks.com/release-notes/product/2020/july.html#ip-access-lists-now-ga)\n+ [New file upload dialog](https://docs.databricks.com/release-notes/product/2020/july.html#new-file-upload-dialog)\n+ [SCIM API filter and sort improvements](https://docs.databricks.com/release-notes/product/2020/july.html#scim-api-filter-and-sort-improvements)\n+ [Databricks Runtime 7.1 GA](https://docs.databricks.com/release-notes/product/2020/july.html#databricks-runtime-71-ga)\n+ [Databricks Runtime 7.1 ML GA](https://docs.databricks.com/release-notes/product/2020/july.html#databricks-runtime-71-ml-ga)\n+ [Databricks Runtime 7.1 Genomics GA](https://docs.databricks.com/release-notes/product/2020/july.html#databricks-runtime-71-genomics-ga)\n+ [Databricks Connect 7.1 (Public Preview)](https://docs.databricks.com/release-notes/product/2020/july.html#databricks-connect-71-public-preview)\n+ [IP Access List API updates](https://docs.databricks.com/release-notes/product/2020/july.html#ip-access-list-api-updates)\n+ [Python notebooks now support multiple outputs per cell](https://docs.databricks.com/release-notes/product/2020/july.html#python-notebooks-now-support-multiple-outputs-per-cell)\n+ [View notebook code and results cells side by side](https://docs.databricks.com/release-notes/product/2020/july.html#view-notebook-code-and-results-cells-side-by-side)\n+ [Pause job schedules](https://docs.databricks.com/release-notes/product/2020/july.html#pause-job-schedules)\n+ [Jobs API endpoints validate run ID](https://docs.databricks.com/release-notes/product/2020/july.html#jobs-api-endpoints-validate-run-id)\n+ [Format SQL in notebooks automatically](https://docs.databricks.com/release-notes/product/2020/july.html#format-sql-in-notebooks-automatically)\n+ [Support for r5.8xlarge and r5.16xlarge instances](https://docs.databricks.com/release-notes/product/2020/july.html#support-for-r58xlarge-and-r516xlarge-instances)\n+ [Use password access control to configure which users are required to log in using SSO or authenticate using tokens (Public Preview)](https://docs.databricks.com/release-notes/product/2020/july.html#use-password-access-control-to-configure-which-users-are-required-to-log-in-using-sso-or-authenticate-using-tokens-public-preview)\n+ [Reproducible order of installation for Maven and CRAN libraries](https://docs.databricks.com/release-notes/product/2020/july.html#reproducible-order-of-installation-for-maven-and-cran-libraries)\n+ [Take control of your users\u2019 personal access tokens with the Token Management API (Public Preview)](https://docs.databricks.com/release-notes/product/2020/july.html#take-control-of-your-users-personal-access-tokens-with-the-token-management-api-public-preview)\n+ [Customer-managed VPC deployments (Public Preview) can now use regional VPC endpoints](https://docs.databricks.com/release-notes/product/2020/july.html#customer-managed-vpc-deployments-public-preview-can-now-use-regional-vpc-endpoints)\n+ [Encrypt traffic between cluster worker nodes (Public Preview)](https://docs.databricks.com/release-notes/product/2020/july.html#encrypt-traffic-between-cluster-worker-nodes-public-preview)\n+ [Table access control supported on all accounts with the Premium plan (Public Preview)](https://docs.databricks.com/release-notes/product/2020/july.html#table-access-control-supported-on-all-accounts-with-the-premium-plan-public-preview)\n+ [IAM credential passthrough supported on all accounts with the Premium plan (Public Preview)](https://docs.databricks.com/release-notes/product/2020/july.html#iam-credential-passthrough-supported-on-all-accounts-with-the-premium-plan-public-preview)\n+ [Restore cut notebook cells](https://docs.databricks.com/release-notes/product/2020/july.html#restore-cut-notebook-cells)\n+ [Assign jobs CAN MANAGE permission to non-admin users](https://docs.databricks.com/release-notes/product/2020/july.html#assign-jobs-can-manage-permission-to-non-admin-users)\n+ [Non-admin Databricks users can view and filter by username using the SCIM API](https://docs.databricks.com/release-notes/product/2020/july.html#non-admin-databricks-users-can-view-and-filter-by-username-using-the-scim-api)\n+ [Link to view cluster specification when you view job run details](https://docs.databricks.com/release-notes/product/2020/july.html#link-to-view-cluster-specification-when-you-view-job-run-details)\n* [June 2020](https://docs.databricks.com/release-notes/product/2020/june.html)\n+ [Billable usage logs delivered to your own S3 bucket (Public Preview)](https://docs.databricks.com/release-notes/product/2020/june.html#billable-usage-logs-delivered-to-your-own-s3-bucket-public-preview)\n+ [Databricks Connect now supports Databricks Runtime 6.6](https://docs.databricks.com/release-notes/product/2020/june.html#databricks-connect-now-supports-databricks-runtime-66)\n+ [Databricks Runtime 7.0 ML GA](https://docs.databricks.com/release-notes/product/2020/june.html#databricks-runtime-70-ml-ga)\n+ [Databricks Runtime 7.0 GA, powered by Apache Spark 3.0](https://docs.databricks.com/release-notes/product/2020/june.html#databricks-runtime-70-ga-powered-by-apache-spark-30)\n+ [Databricks Runtime 7.0 for Genomics GA](https://docs.databricks.com/release-notes/product/2020/june.html#databricks-runtime-70-for-genomics-ga)\n+ [Stage-dependent access controls for MLflow models](https://docs.databricks.com/release-notes/product/2020/june.html#stage-dependent-access-controls-for-mlflow-models)\n+ [Notebooks now support disabling auto-scroll](https://docs.databricks.com/release-notes/product/2020/june.html#notebooks-now-support-disabling-auto-scroll)\n+ [Skipping instance profile validation now available in the UI](https://docs.databricks.com/release-notes/product/2020/june.html#skipping-instance-profile-validation-now-available-in-the-ui)\n+ [Account ID is displayed in account console](https://docs.databricks.com/release-notes/product/2020/june.html#account-id-is-displayed-in-account-console)\n+ [Internet Explorer 11 support ends on August 15](https://docs.databricks.com/release-notes/product/2020/june.html#internet-explorer-11-support-ends-on-august-15)\n+ [Databricks Runtime 6.2 series support ends](https://docs.databricks.com/release-notes/product/2020/june.html#databricks-runtime-62-series-support-ends)\n+ [Simplify and control cluster creation using cluster policies (Public Preview)](https://docs.databricks.com/release-notes/product/2020/june.html#simplify-and-control-cluster-creation-using-cluster-policies-public-preview)\n+ [SCIM Me endpoint now returns SCIM compliant response](https://docs.databricks.com/release-notes/product/2020/june.html#scim-me-endpoint-now-returns-scim-compliant-response)\n+ [G4 family of GPU-accelerated EC2 instances now available for machine learning application deployments (Beta)](https://docs.databricks.com/release-notes/product/2020/june.html#g4-family-of-gpu-accelerated-ec2-instances-now-available-for-machine-learning-application-deployments-beta)\n+ [Deploy multiple workspaces in your Databricks account (Public Preview)](https://docs.databricks.com/release-notes/product/2020/june.html#deploy-multiple-workspaces-in-your-databricks-account-public-preview)\n+ [Deploy Databricks workspaces in your own VPC (Public Preview)](https://docs.databricks.com/release-notes/product/2020/june.html#deploy-databricks-workspaces-in-your-own-vpc-public-preview)\n+ [Secure cluster connectivity with no open ports on your VPCs and no public IP addresses on Databricks workers (Public Preview)](https://docs.databricks.com/release-notes/product/2020/june.html#secure-cluster-connectivity-with-no-open-ports-on-your-vpcs-and-no-public-ip-addresses-on-databricks-workers-public-preview)\n+ [Restrict access to Databricks using IP access lists (Public Preview)](https://docs.databricks.com/release-notes/product/2020/june.html#restrict-access-to-databricks-using-ip-access-lists-public-preview)\n+ [Encrypt locally attached disks (Public Preview)](https://docs.databricks.com/release-notes/product/2020/june.html#encrypt-locally-attached-disks-public-preview)\n* [May 2020](https://docs.databricks.com/release-notes/product/2020/may.html)\n+ [Databricks Runtime 6.6 for Genomics GA](https://docs.databricks.com/release-notes/product/2020/may.html#databricks-runtime-66-for-genomics-ga)\n+ [Databricks Runtime 6.6 ML GA](https://docs.databricks.com/release-notes/product/2020/may.html#databricks-runtime-66-ml-ga)\n+ [Databricks Runtime 6.6 GA](https://docs.databricks.com/release-notes/product/2020/may.html#databricks-runtime-66-ga)\n+ [Easily view large numbers of MLflow registered models](https://docs.databricks.com/release-notes/product/2020/may.html#easily-view-large-numbers-of-mlflow-registered-models)\n+ [Libraries configured to be installed on all clusters are not installed on clusters running Databricks Runtime 7.0 and above](https://docs.databricks.com/release-notes/product/2020/may.html#libraries-configured-to-be-installed-on-all-clusters-are-not-installed-on-clusters-running-databricks-runtime-70-and-above)\n+ [Databricks Runtime 7.0 for Genomics (Beta)](https://docs.databricks.com/release-notes/product/2020/may.html#databricks-runtime-70-for-genomics-beta)\n+ [Databricks Runtime 7.0 ML (Beta)](https://docs.databricks.com/release-notes/product/2020/may.html#databricks-runtime-70-ml-beta)\n+ [Databricks Runtime 6.6 for Genomics (Beta)](https://docs.databricks.com/release-notes/product/2020/may.html#databricks-runtime-66-for-genomics-beta)\n+ [Databricks Runtime 6.6 ML (Beta)](https://docs.databricks.com/release-notes/product/2020/may.html#databricks-runtime-66-ml-beta)\n+ [Databricks Runtime 6.6 (Beta)](https://docs.databricks.com/release-notes/product/2020/may.html#databricks-runtime-66-beta)\n+ [Job clusters now tagged with job name and ID](https://docs.databricks.com/release-notes/product/2020/may.html#job-clusters-now-tagged-with-job-name-and-id)\n+ [DBFS REST API delete endpoint size limit](https://docs.databricks.com/release-notes/product/2020/may.html#dbfs-rest-api-delete-endpoint-size-limit)\n+ [Restore deleted notebook cells](https://docs.databricks.com/release-notes/product/2020/may.html#restore-deleted-notebook-cells)\n+ [Jobs pending queue limit](https://docs.databricks.com/release-notes/product/2020/may.html#jobs-pending-queue-limit)\n* [April 2020](https://docs.databricks.com/release-notes/product/2020/april.html)\n+ [MLflow tracking UI enhancement](https://docs.databricks.com/release-notes/product/2020/april.html#mlflow-tracking-ui-enhancement)\n+ [Notebook usability improvements](https://docs.databricks.com/release-notes/product/2020/april.html#notebook-usability-improvements)\n+ [Databricks Connect now supports Databricks Runtime 6.5](https://docs.databricks.com/release-notes/product/2020/april.html#databricks-connect-now-supports-databricks-runtime-65)\n+ [Databricks Runtime 6.1 and 6.1 ML support ends](https://docs.databricks.com/release-notes/product/2020/april.html#databricks-runtime-61-and-61-ml-support-ends)\n+ [Databricks Runtime 6.5 GA](https://docs.databricks.com/release-notes/product/2020/april.html#databricks-runtime-65-ga)\n+ [Databricks Runtime 6.5 for Machine Learning GA](https://docs.databricks.com/release-notes/product/2020/april.html#databricks-runtime-65-for-machine-learning-ga)\n+ [Databricks Runtime 6.5 for Genomics GA](https://docs.databricks.com/release-notes/product/2020/april.html#databricks-runtime-65-for-genomics-ga)\n+ [Authenticate to S3 buckets automatically using your IAM credentials (Public Preview)](https://docs.databricks.com/release-notes/product/2020/april.html#authenticate-to-s3-buckets-automatically-using-your-iam-credentials-public-preview)\n+ [IAM role renamed to instance profile](https://docs.databricks.com/release-notes/product/2020/april.html#iam-role-renamed-to-instance-profile)\n+ [Easier notebook title changes](https://docs.databricks.com/release-notes/product/2020/april.html#easier-notebook-title-changes)\n+ [Cluster termination reporting enhancement](https://docs.databricks.com/release-notes/product/2020/april.html#cluster-termination-reporting-enhancement)\n+ [DBFS REST API delete endpoint size limit](https://docs.databricks.com/release-notes/product/2020/april.html#dbfs-rest-api-delete-endpoint-size-limit)\n+ [Databricks Runtime 6.0 and 6.0 ML support ends](https://docs.databricks.com/release-notes/product/2020/april.html#databricks-runtime-60-and-60-ml-support-ends)\n* [March 2020](https://docs.databricks.com/release-notes/product/2020/march.html)\n+ [Managed MLflow Model Registry collaborative hub available (Public Preview)](https://docs.databricks.com/release-notes/product/2020/march.html#managed-mlflow-model-registry-collaborative-hub-available-public-preview)\n+ [Load data from hundreds of data sources into Delta Lake using Stitch](https://docs.databricks.com/release-notes/product/2020/march.html#load-data-from-hundreds-of-data-sources-into-delta-lake-using-stitch)\n+ [Databricks Runtime 7.0 (Beta) previews Apache Spark 3.0](https://docs.databricks.com/release-notes/product/2020/march.html#databricks-runtime-70-beta-previews-apache-spark-30)\n+ [Databricks Runtime 6.5 ML (Beta)](https://docs.databricks.com/release-notes/product/2020/march.html#databricks-runtime-65-ml-beta)\n+ [Databricks Runtime 6.5 (Beta)](https://docs.databricks.com/release-notes/product/2020/march.html#databricks-runtime-65-beta)\n+ [Optimized autoscaling on all-purpose clusters running Databricks Runtime 6.4 and above](https://docs.databricks.com/release-notes/product/2020/march.html#optimized-autoscaling-on-all-purpose-clusters-running-databricks-runtime-64-and-above)\n+ [Single-sign-on (SSO) now available on all pricing plans](https://docs.databricks.com/release-notes/product/2020/march.html#single-sign-on-sso-now-available-on-all-pricing-plans)\n+ [Develop and test Shiny applications inside RStudio Server](https://docs.databricks.com/release-notes/product/2020/march.html#develop-and-test-shiny-applications-inside-rstudio-server)\n+ [Change the default language of a notebook](https://docs.databricks.com/release-notes/product/2020/march.html#change-the-default-language-of-a-notebook)\n+ [Databricks to add anonymized usage analytics](https://docs.databricks.com/release-notes/product/2020/march.html#databricks-to-add-anonymized-usage-analytics)\n+ [Databricks Connect now supports Databricks Runtime 6.4](https://docs.databricks.com/release-notes/product/2020/march.html#databricks-connect-now-supports-databricks-runtime-64)\n+ [Databricks Connect now supports Databricks Runtime 6.3](https://docs.databricks.com/release-notes/product/2020/march.html#databricks-connect-now-supports-databricks-runtime-63)\n* [February 2020](https://docs.databricks.com/release-notes/product/2020/february.html)\n+ [Databricks Runtime 6.4 for Genomics GA](https://docs.databricks.com/release-notes/product/2020/february.html#databricks-runtime-64-for-genomics-ga)\n+ [Databricks Runtime 6.4 ML GA](https://docs.databricks.com/release-notes/product/2020/february.html#databricks-runtime-64-ml-ga)\n+ [Databricks Runtime 6.4 GA](https://docs.databricks.com/release-notes/product/2020/february.html#databricks-runtime-64-ga)\n+ [The Clusters and Jobs UIs now reflect new cluster terminology and cluster pricing](https://docs.databricks.com/release-notes/product/2020/february.html#the-clusters-and-jobs-uis-now-reflect-new-cluster-terminology-and-cluster-pricing)\n+ [New interactive charts offer rich client-side interactions](https://docs.databricks.com/release-notes/product/2020/february.html#new-interactive-charts-offer-rich-client-side-interactions)\n+ [New data ingestion network adds partner integrations with Delta Lake (Public Preview)](https://docs.databricks.com/release-notes/product/2020/february.html#new-data-ingestion-network-adds-partner-integrations-with-delta-lake-public-preview)\n+ [Flags to manage workspace security and notebook features now available](https://docs.databricks.com/release-notes/product/2020/february.html#flags-to-manage-workspace-security-and-notebook-features-now-available)\n* [January 2020](https://docs.databricks.com/release-notes/product/2020/january.html)\n+ [All cluster and pool tags now propagate to usage reports](https://docs.databricks.com/release-notes/product/2020/january.html#all-cluster-and-pool-tags-now-propagate-to-usage-reports)\n+ [Cluster and pool tag propagation to EC2 instances is more accurate](https://docs.databricks.com/release-notes/product/2020/january.html#cluster-and-pool-tag-propagation-to-ec2-instances-is-more-accurate)\n+ [Databricks Runtime 6.3 for Genomics GA](https://docs.databricks.com/release-notes/product/2020/january.html#databricks-runtime-63-for-genomics-ga)\n+ [Databricks Runtime 6.3 ML GA](https://docs.databricks.com/release-notes/product/2020/january.html#databricks-runtime-63-ml-ga)\n+ [Databricks Runtime 6.3 GA](https://docs.databricks.com/release-notes/product/2020/january.html#databricks-runtime-63-ga)\n+ [Cluster worker machine images now use chrony for NTP](https://docs.databricks.com/release-notes/product/2020/january.html#cluster-worker-machine-images-now-use-chrony-for-ntp)\n+ [Cluster standard autoscaling step is now configurable](https://docs.databricks.com/release-notes/product/2020/january.html#cluster-standard-autoscaling-step-is-now-configurable)\n+ [SCIM API supports pagination for Get Users and Get Groups (Public Preview)](https://docs.databricks.com/release-notes/product/2020/january.html#scim-api-supports-pagination-for-get-users-and-get-groups-public-preview)\n+ [File browser swimlane widths increased to 240px](https://docs.databricks.com/release-notes/product/2020/january.html#file-browser-swimlane-widths-increased-to-240px)\n+ [Databricks Runtime 3.5 LTS support ends](https://docs.databricks.com/release-notes/product/2020/january.html#databricks-runtime-35-lts-support-ends)\n* [December 2019](https://docs.databricks.com/release-notes/product/2019/december.html)\n+ [Databricks Connect now supports Databricks Runtime 6.2](https://docs.databricks.com/release-notes/product/2019/december.html#databricks-connect-now-supports-databricks-runtime-62)\n+ [Databricks Runtime 6.2 for Genomics GA](https://docs.databricks.com/release-notes/product/2019/december.html#databricks-runtime-62-for-genomics-ga)\n+ [Azure Databricks SCIM provisioning connector available in the app gallery](https://docs.databricks.com/release-notes/product/2019/december.html#azure-databricks-scim-provisioning-connector-available-in-the-app-gallery)\n+ [Databricks Runtime 5.3 and 5.4 support ends](https://docs.databricks.com/release-notes/product/2019/december.html#databricks-runtime-53-and-54-support-ends)\n+ [Databricks Runtime 6.2 ML GA](https://docs.databricks.com/release-notes/product/2019/december.html#databricks-runtime-62-ml-ga)\n+ [Databricks Runtime 6.2 GA](https://docs.databricks.com/release-notes/product/2019/december.html#databricks-runtime-62-ga)\n+ [Databricks Connect now supports Databricks Runtime 6.1](https://docs.databricks.com/release-notes/product/2019/december.html#databricks-connect-now-supports-databricks-runtime-61)\n* [November 2019](https://docs.databricks.com/release-notes/product/2019/november.html)\n+ [Databricks Runtime 6.2 ML Beta](https://docs.databricks.com/release-notes/product/2019/november.html#databricks-runtime-62-ml-beta)\n+ [Databricks Runtime 6.2 Beta](https://docs.databricks.com/release-notes/product/2019/november.html#databricks-runtime-62-beta)\n+ [Configure clusters with your own container image using Databricks Container Services](https://docs.databricks.com/release-notes/product/2019/november.html#configure-clusters-with-your-own-container-image-using-databricks-container-services)\n+ [Cluster detail now shows only cluster ID in the HTTP path](https://docs.databricks.com/release-notes/product/2019/november.html#cluster-detail-now-shows-only-cluster-id-in-the-http-path)\n+ [Secrets referenced by Spark configuration properties and environment variables (Public Preview)](https://docs.databricks.com/release-notes/product/2019/november.html#secrets-referenced-by-spark-configuration-properties-and-environment-variables-public-preview)\n* [October 2019](https://docs.databricks.com/release-notes/product/2019/october.html)\n+ [Databricks Runtime 6.1 for Genomics GA](https://docs.databricks.com/release-notes/product/2019/october.html#databricks-runtime-61-for-genomics-ga)\n+ [Databricks Runtime 6.1 for Machine Learning GA](https://docs.databricks.com/release-notes/product/2019/october.html#databricks-runtime-61-for-machine-learning-ga)\n+ [MLflow API calls are now rate limited](https://docs.databricks.com/release-notes/product/2019/october.html#mlflow-api-calls-are-now-rate-limited)\n+ [Pools of instances for quick cluster launch generally available](https://docs.databricks.com/release-notes/product/2019/october.html#pools-of-instances-for-quick-cluster-launch-generally-available)\n+ [New instance types (Beta)](https://docs.databricks.com/release-notes/product/2019/october.html#new-instance-types-beta)\n+ [Databricks Runtime 6.1 GA](https://docs.databricks.com/release-notes/product/2019/october.html#databricks-runtime-61-ga)\n+ [Databricks Runtime 6.0 for Genomics GA](https://docs.databricks.com/release-notes/product/2019/october.html#databricks-runtime-60-for-genomics-ga)\n+ [Non-admin Databricks users can read user and group names and IDs using SCIM API](https://docs.databricks.com/release-notes/product/2019/october.html#non-admin-databricks-users-can-read-user-and-group-names-and-ids-using-scim-api)\n+ [Workspace API returns notebook and folder object IDs](https://docs.databricks.com/release-notes/product/2019/october.html#workspace-api-returns-notebook-and-folder-object-ids)\n+ [Databricks Runtime 6.0 ML GA](https://docs.databricks.com/release-notes/product/2019/october.html#databricks-runtime-60-ml-ga)\n+ [Databricks Runtime 6.0 GA](https://docs.databricks.com/release-notes/product/2019/october.html#databricks-runtime-60-ga)\n+ [Account usage reports now show usage by user name](https://docs.databricks.com/release-notes/product/2019/october.html#account-usage-reports-now-show-usage-by-user-name)\n* [September 2019](https://docs.databricks.com/release-notes/product/2019/september.html)\n+ [Databricks Runtime 5.2 support ends](https://docs.databricks.com/release-notes/product/2019/september.html#databricks-runtime-52-support-ends)\n+ [Launch pool-backed automated clusters that use Databricks Light (Public Preview)](https://docs.databricks.com/release-notes/product/2019/september.html#launch-pool-backed-automated-clusters-that-use-databricks-light-public-preview)\n+ [Beta support for m5a and r5a instances](https://docs.databricks.com/release-notes/product/2019/september.html#beta-support-for-m5a-and-r5a-instances)\n+ [pandas DataFrames now render in notebooks without scaling](https://docs.databricks.com/release-notes/product/2019/september.html#pandas-dataframes-now-render-in-notebooks-without-scaling)\n+ [Python version selector display now dynamic](https://docs.databricks.com/release-notes/product/2019/september.html#python-version-selector-display-now-dynamic)\n+ [Databricks Runtime 6.0 Beta](https://docs.databricks.com/release-notes/product/2019/september.html#databricks-runtime-60-beta)\n* [August 2019](https://docs.databricks.com/release-notes/product/2019/august.html)\n+ [Workspace library installation enhancement](https://docs.databricks.com/release-notes/product/2019/august.html#workspace-library-installation-enhancement)\n+ [Clusters UI now reflects more consistent *interactive* and *automated* cluster terminology](https://docs.databricks.com/release-notes/product/2019/august.html#clusters-ui-now-reflects-more-consistent-interactive-and-automated-cluster-terminology)\n+ [Databricks Runtime 5.5 and Databricks Runtime 5.5 ML are LTS](https://docs.databricks.com/release-notes/product/2019/august.html#databricks-runtime-55-and-databricks-runtime-55-ml-are-lts)\n+ [Instance allocation notifications for pools](https://docs.databricks.com/release-notes/product/2019/august.html#instance-allocation-notifications-for-pools)\n+ [New cluster events](https://docs.databricks.com/release-notes/product/2019/august.html#new-cluster-events)\n+ [MLflow updates](https://docs.databricks.com/release-notes/product/2019/august.html#mlflow-updates)\n* [July 2019](https://docs.databricks.com/release-notes/product/2019/july.html)\n+ [Coming soon: Databricks 6.0 will not support Python 2](https://docs.databricks.com/release-notes/product/2019/july.html#coming-soon-databricks-60-will-not-support-python-2)\n+ [Ideas Portal](https://docs.databricks.com/release-notes/product/2019/july.html#ideas-portal)\n+ [Preload the Databricks Runtime version on pool idle instances](https://docs.databricks.com/release-notes/product/2019/july.html#preload-the-databricks-runtime-version-on-pool-idle-instances)\n+ [Custom cluster tags and pool tags play better together](https://docs.databricks.com/release-notes/product/2019/july.html#custom-cluster-tags-and-pool-tags-play-better-together)\n+ [MLflow 1.1 brings several UI and API improvements](https://docs.databricks.com/release-notes/product/2019/july.html#mlflow-11-brings-several-ui-and-api-improvements)\n+ [pandas DataFrame display renders like it does in Jupyter](https://docs.databricks.com/release-notes/product/2019/july.html#pandas-dataframe-display-renders-like-it-does-in-jupyter)\n+ [New regions](https://docs.databricks.com/release-notes/product/2019/july.html#new-regions)\n+ [Databricks Runtime 5.5 with Conda (Beta)](https://docs.databricks.com/release-notes/product/2019/july.html#databricks-runtime-55-with-conda-beta)\n+ [Set permissions on pools (Public Preview)](https://docs.databricks.com/release-notes/product/2019/july.html#set-permissions-on-pools-public-preview)\n+ [Databricks Runtime 5.5 for Machine Learning](https://docs.databricks.com/release-notes/product/2019/july.html#databricks-runtime-55-for-machine-learning)\n+ [Databricks Runtime 5.5](https://docs.databricks.com/release-notes/product/2019/july.html#databricks-runtime-55)\n+ [Keep a pool of instances on standby for quick cluster launch (Public Preview)](https://docs.databricks.com/release-notes/product/2019/july.html#keep-a-pool-of-instances-on-standby-for-quick-cluster-launch-public-preview)\n+ [Global series color](https://docs.databricks.com/release-notes/product/2019/july.html#global-series-color)\n* [June 2019](https://docs.databricks.com/release-notes/product/2019/june.html)\n+ [Account usage chart updated to display usage grouped by workload type](https://docs.databricks.com/release-notes/product/2019/june.html#account-usage-chart-updated-to-display-usage-grouped-by-workload-type)\n+ [RStudio integration no longer limited to high concurrency clusters](https://docs.databricks.com/release-notes/product/2019/june.html#rstudio-integration-no-longer-limited-to-high-concurrency-clusters)\n+ [MLflow 1.0](https://docs.databricks.com/release-notes/product/2019/june.html#mlflow-10)\n+ [Databricks Runtime 5.4 with Conda (Beta)](https://docs.databricks.com/release-notes/product/2019/june.html#databricks-runtime-54-with-conda-beta)\n+ [Databricks Runtime 5.4 for Machine Learning](https://docs.databricks.com/release-notes/product/2019/june.html#databricks-runtime-54-for-machine-learning)\n+ [Databricks Runtime 5.4](https://docs.databricks.com/release-notes/product/2019/june.html#databricks-runtime-54)\n* [May 2019](https://docs.databricks.com/release-notes/product/2019/may.html)\n+ [Cluster event log filtering](https://docs.databricks.com/release-notes/product/2019/may.html#cluster-event-log-filtering)\n+ [JDBC/ODBC connectivity available without Premium plan or above](https://docs.databricks.com/release-notes/product/2019/may.html#jdbcodbc-connectivity-available-without-premium-plan-or-above)\n* [April 2019](https://docs.databricks.com/release-notes/product/2019/april.html)\n+ [MLflow on Databricks (GA)](https://docs.databricks.com/release-notes/product/2019/april.html#mlflow-on-databricks-ga)\n+ [Delta Lake on Databricks](https://docs.databricks.com/release-notes/product/2019/april.html#delta-lake-on-databricks)\n+ [MLflow runs sidebar](https://docs.databricks.com/release-notes/product/2019/april.html#mlflow-runs-sidebar)\n+ [C5d series Amazon EC2 instance types (Beta)](https://docs.databricks.com/release-notes/product/2019/april.html#c5d-series-amazon-ec2-instance-types-beta)\n+ [Databricks Runtime 5.3 (GA)](https://docs.databricks.com/release-notes/product/2019/april.html#databricks-runtime-53-ga)\n+ [Databricks Runtime 5.3 ML (GA)](https://docs.databricks.com/release-notes/product/2019/april.html#databricks-runtime-53-ml-ga)\n* [March 2019](https://docs.databricks.com/release-notes/product/2019/march.html)\n+ [Purge deleted MLflow experiments and runs](https://docs.databricks.com/release-notes/product/2019/march.html#purge-deleted-mlflow-experiments-and-runs)\n+ [Databricks Light generally available](https://docs.databricks.com/release-notes/product/2019/march.html#databricks-light-generally-available)\n+ [Searchable cluster selector](https://docs.databricks.com/release-notes/product/2019/march.html#searchable-cluster-selector)\n+ [Upcoming usage display changes](https://docs.databricks.com/release-notes/product/2019/march.html#upcoming-usage-display-changes)\n+ [Manage groups from the Admin Console](https://docs.databricks.com/release-notes/product/2019/march.html#manage-groups-from-the-admin-console)\n+ [Notebooks automatically have associated MLflow experiment](https://docs.databricks.com/release-notes/product/2019/march.html#notebooks-automatically-have-associated-mlflow-experiment)\n+ [Z1d series Amazon EC2 instance types (Beta)](https://docs.databricks.com/release-notes/product/2019/march.html#z1d-series-amazon-ec2-instance-types-beta)\n+ [Two private IP addresses per node](https://docs.databricks.com/release-notes/product/2019/march.html#two-private-ip-addresses-per-node)\n+ [Databricks Delta public community](https://docs.databricks.com/release-notes/product/2019/march.html#databricks-delta-public-community)\n* [February 2019](https://docs.databricks.com/release-notes/product/2019/february.html)\n+ [Managed MLflow on Databricks Public Preview](https://docs.databricks.com/release-notes/product/2019/february.html#managed-mlflow-on-databricks-public-preview)\n+ [Azure Data Lake Storage Gen2 connector is generally available](https://docs.databricks.com/release-notes/product/2019/february.html#azure-data-lake-storage-gen2-connector-is-generally-available)\n+ [Python 3 now the default when you create clusters](https://docs.databricks.com/release-notes/product/2019/february.html#python-3-now-the-default-when-you-create-clusters)\n+ [Additional cluster instance types](https://docs.databricks.com/release-notes/product/2019/february.html#additional-cluster-instance-types)\n+ [Delta Lake generally available](https://docs.databricks.com/release-notes/product/2019/february.html#delta-lake-generally-available)\n* [January 2019](https://docs.databricks.com/release-notes/product/2019/january.html)\n+ [Upcoming change: Python 3 to become the default when you create clusters](https://docs.databricks.com/release-notes/product/2019/january.html#upcoming-change-python-3-to-become-the-default-when-you-create-clusters)\n+ [Databricks Runtime 5.2 for Machine Learning (Beta) release](https://docs.databricks.com/release-notes/product/2019/january.html#databricks-runtime-52-for-machine-learning-beta-release)\n+ [Cluster configuration JSON view](https://docs.databricks.com/release-notes/product/2019/january.html#cluster-configuration-json-view)\n+ [Library UI](https://docs.databricks.com/release-notes/product/2019/january.html#library-ui)\n+ [Cluster Events](https://docs.databricks.com/release-notes/product/2019/january.html#cluster-events)\n+ [Cluster UI](https://docs.databricks.com/release-notes/product/2019/january.html#cluster-ui)\n* [December 2018](https://docs.databricks.com/release-notes/product/2018/december.html)\n+ [Databricks Runtime 5.1 for Machine Learning (Beta) release](https://docs.databricks.com/release-notes/product/2018/december.html#databricks-runtime-51-for-machine-learning-beta-release)\n+ [Databricks Runtime 5.1 release](https://docs.databricks.com/release-notes/product/2018/december.html#databricks-runtime-51-release)\n+ [Library UI](https://docs.databricks.com/release-notes/product/2018/december.html#library-ui)\n* [November 2018](https://docs.databricks.com/release-notes/product/2018/november.html)\n+ [Library UI](https://docs.databricks.com/release-notes/product/2018/november.html#library-ui)\n+ [Custom Spark heap memory settings enabled](https://docs.databricks.com/release-notes/product/2018/november.html#custom-spark-heap-memory-settings-enabled)\n+ [Jobs and idle execution context eviction](https://docs.databricks.com/release-notes/product/2018/november.html#jobs-and-idle-execution-context-eviction)\n+ [Databricks Runtime 5.0 for Machine Learning (Beta) release](https://docs.databricks.com/release-notes/product/2018/november.html#databricks-runtime-50-for-machine-learning-beta-release)\n+ [Databricks Runtime 5.0 release](https://docs.databricks.com/release-notes/product/2018/november.html#databricks-runtime-50-release)\n+ [`displayHTML` support for unrestricted loading of third-party content](https://docs.databricks.com/release-notes/product/2018/november.html#displayhtml-support-for-unrestricted-loading-of-third-party-content)\n* [October 2018](https://docs.databricks.com/release-notes/product/2018/october.html)\n+ [SCIM provisioning using OneLogin](https://docs.databricks.com/release-notes/product/2018/october.html#scim-provisioning-using-onelogin)\n+ [Copy notebook file path without opening notebook](https://docs.databricks.com/release-notes/product/2018/october.html#copy-notebook-file-path-without-opening-notebook)\n* [September 2018](https://docs.databricks.com/release-notes/product/2018/september.html)\n+ [SCIM provisioning using Okta and Microsoft Entra ID (Preview)](https://docs.databricks.com/release-notes/product/2018/september.html#scim-provisioning-using-okta-and-microsoft-entra-id-preview)\n+ [EBS leaked volumes deletion](https://docs.databricks.com/release-notes/product/2018/september.html#ebs-leaked-volumes-deletion)\n+ [Support for r5 instances](https://docs.databricks.com/release-notes/product/2018/september.html#support-for-r5-instances)\n+ [SCIM API for provisioning users and groups (Preview)](https://docs.databricks.com/release-notes/product/2018/september.html#scim-api-for-provisioning-users-and-groups-preview)\n* [August 2018](https://docs.databricks.com/release-notes/product/2018/august.html)\n+ [Workspace sidebar redesign](https://docs.databricks.com/release-notes/product/2018/august.html#workspace-sidebar-redesign)\n+ [New environment variables in init scripts](https://docs.databricks.com/release-notes/product/2018/august.html#new-environment-variables-in-init-scripts)\n+ [EBS leaked volumes logging and deletion](https://docs.databricks.com/release-notes/product/2018/august.html#ebs-leaked-volumes-logging-and-deletion)\n+ [AWS r3 and c3 instance types now deprecated](https://docs.databricks.com/release-notes/product/2018/august.html#aws-r3-and-c3-instance-types-now-deprecated)\n+ [Audit logging for ACL changes](https://docs.databricks.com/release-notes/product/2018/august.html#audit-logging-for-acl-changes)\n+ [Cluster-scoped init scripts](https://docs.databricks.com/release-notes/product/2018/august.html#cluster-scoped-init-scripts)\n+ [Collapsible headings](https://docs.databricks.com/release-notes/product/2018/august.html#collapsible-headings)\n* [July 2018](https://docs.databricks.com/release-notes/product/2018/july.html)\n+ [Libraries API supports Python wheel files](https://docs.databricks.com/release-notes/product/2018/july.html#libraries-api-supports-python-wheel-files)\n+ [IPython notebook export](https://docs.databricks.com/release-notes/product/2018/july.html#ipython-notebook-export)\n+ [New instance types (beta)](https://docs.databricks.com/release-notes/product/2018/july.html#new-instance-types-beta)\n+ [Cluster mode and High Concurrency clusters](https://docs.databricks.com/release-notes/product/2018/july.html#cluster-mode-and-high-concurrency-clusters)\n+ [Table access control](https://docs.databricks.com/release-notes/product/2018/july.html#table-access-control)\n+ [RStudio integration](https://docs.databricks.com/release-notes/product/2018/july.html#rstudio-integration)\n+ [R Markdown support](https://docs.databricks.com/release-notes/product/2018/july.html#r-markdown-support)\n+ [Home page redesign, with ability to drop files to import data](https://docs.databricks.com/release-notes/product/2018/july.html#home-page-redesign-with-ability-to-drop-files-to-import-data)\n+ [Widget default behavior](https://docs.databricks.com/release-notes/product/2018/july.html#widget-default-behavior)\n+ [Table creation UI](https://docs.databricks.com/release-notes/product/2018/july.html#table-creation-ui)\n+ [Multi-line JSON data import](https://docs.databricks.com/release-notes/product/2018/july.html#multi-line-json-data-import)\n* [June 2018](https://docs.databricks.com/release-notes/product/2018/june.html)\n+ [Cluster log purge](https://docs.databricks.com/release-notes/product/2018/june.html#cluster-log-purge)\n+ [Trash folder](https://docs.databricks.com/release-notes/product/2018/june.html#trash-folder)\n+ [Reduced log retention period](https://docs.databricks.com/release-notes/product/2018/june.html#reduced-log-retention-period)\n+ [Gzipped API responses](https://docs.databricks.com/release-notes/product/2018/june.html#gzipped-api-responses)\n+ [Table import UI](https://docs.databricks.com/release-notes/product/2018/june.html#table-import-ui)\n* [May 2018](https://docs.databricks.com/release-notes/product/2018/may.html)\n+ [General Data Protection Regulation (GDPR)](https://docs.databricks.com/release-notes/product/2018/may.html#general-data-protection-regulation-gdpr)\n+ [HorovodEstimator](https://docs.databricks.com/release-notes/product/2018/may.html#horovodestimator)\n+ [MLeap ML Model Export](https://docs.databricks.com/release-notes/product/2018/may.html#mleap-ml-model-export)\n+ [Notebook cells: hide and show](https://docs.databricks.com/release-notes/product/2018/may.html#notebook-cells-hide-and-show)\n+ [Doc site search](https://docs.databricks.com/release-notes/product/2018/may.html#doc-site-search)\n+ [Databricks Runtime 4.1 for Machine Learning (Beta)](https://docs.databricks.com/release-notes/product/2018/may.html#databricks-runtime-41-for-machine-learning-beta)\n+ [New GPU cluster types](https://docs.databricks.com/release-notes/product/2018/may.html#new-gpu-cluster-types)\n+ [Secret management](https://docs.databricks.com/release-notes/product/2018/may.html#secret-management)\n+ [Cluster pinning](https://docs.databricks.com/release-notes/product/2018/may.html#cluster-pinning)\n+ [Cluster autostart](https://docs.databricks.com/release-notes/product/2018/may.html#cluster-autostart)\n+ [Workspace purging](https://docs.databricks.com/release-notes/product/2018/may.html#workspace-purging)\n+ [Databricks CLI 0.7.1](https://docs.databricks.com/release-notes/product/2018/may.html#databricks-cli-071)\n+ [Display() support for image data types](https://docs.databricks.com/release-notes/product/2018/may.html#display-support-for-image-data-types)\n+ [Databricks Delta update](https://docs.databricks.com/release-notes/product/2018/may.html#databricks-delta-update)\n+ [S3 Select connector](https://docs.databricks.com/release-notes/product/2018/may.html#s3-select-connector)\n* [April 2018](https://docs.databricks.com/release-notes/product/2018/april.html)\n+ [AWS account updates](https://docs.databricks.com/release-notes/product/2018/april.html#aws-account-updates)\n+ [Spark error tips](https://docs.databricks.com/release-notes/product/2018/april.html#spark-error-tips)\n+ [Databricks CLI 0.7.0](https://docs.databricks.com/release-notes/product/2018/april.html#databricks-cli-070)\n+ [Increase init script output truncation limit](https://docs.databricks.com/release-notes/product/2018/april.html#increase-init-script-output-truncation-limit)\n+ [Clusters API: added UPSIZE\\_COMPLETED event type](https://docs.databricks.com/release-notes/product/2018/april.html#clusters-api-added-upsize_completed-event-type)\n+ [Command autocomplete](https://docs.databricks.com/release-notes/product/2018/april.html#command-autocomplete)\n+ [Serverless pools upgraded to Databricks Runtime 4.0](https://docs.databricks.com/release-notes/product/2018/april.html#serverless-pools-upgraded-to-databricks-runtime-40)\n* [March 2018](https://docs.databricks.com/release-notes/product/2018/march.html)\n+ [Command execution details](https://docs.databricks.com/release-notes/product/2018/march.html#command-execution-details)\n+ [Databricks CLI 0.6.1 supports `--profile`](https://docs.databricks.com/release-notes/product/2018/march.html#databricks-cli-061-supports---profile)\n+ [ACLs enabled by default for new Operational Security customers](https://docs.databricks.com/release-notes/product/2018/march.html#acls-enabled-by-default-for-new-operational-security-customers)\n+ [New doc site theme](https://docs.databricks.com/release-notes/product/2018/march.html#new-doc-site-theme)\n+ [Cluster event log](https://docs.databricks.com/release-notes/product/2018/march.html#cluster-event-log)\n+ [Databricks CLI: 0.6.0 release](https://docs.databricks.com/release-notes/product/2018/march.html#databricks-cli-060-release)\n+ [Job run management](https://docs.databricks.com/release-notes/product/2018/march.html#job-run-management)\n+ [Edit cluster permissions now requires edit mode](https://docs.databricks.com/release-notes/product/2018/march.html#edit-cluster-permissions-now-requires-edit-mode)\n+ [Databricks ML Model Export](https://docs.databricks.com/release-notes/product/2018/march.html#databricks-ml-model-export)\n* [February 2018](https://docs.databricks.com/release-notes/product/2018/february.html)\n+ [New line chart supports time-series data](https://docs.databricks.com/release-notes/product/2018/february.html#new-line-chart-supports-time-series-data)\n+ [More visualization improvements](https://docs.databricks.com/release-notes/product/2018/february.html#more-visualization-improvements)\n+ [Delete job runs using Job API](https://docs.databricks.com/release-notes/product/2018/february.html#delete-job-runs-using-job-api)\n+ [Bring your own S3 bucket](https://docs.databricks.com/release-notes/product/2018/february.html#bring-your-own-s3-bucket)\n+ [KaTeX math rendering library updated](https://docs.databricks.com/release-notes/product/2018/february.html#katex-math-rendering-library-updated)\n+ [Databricks CLI: 0.5.0 release](https://docs.databricks.com/release-notes/product/2018/february.html#databricks-cli-050-release)\n+ [DBUtils API library](https://docs.databricks.com/release-notes/product/2018/february.html#dbutils-api-library)\n+ [Filter for your jobs only](https://docs.databricks.com/release-notes/product/2018/february.html#filter-for-your-jobs-only)\n+ [Spark-submit from the Create Job page](https://docs.databricks.com/release-notes/product/2018/february.html#spark-submit-from-the-create-job-page)\n+ [Select Python 3 from the Create Cluster page](https://docs.databricks.com/release-notes/product/2018/february.html#select-python-3-from-the-create-cluster-page)\n+ [Workspace UI improvements](https://docs.databricks.com/release-notes/product/2018/february.html#workspace-ui-improvements)\n+ [Autocomplete for SQL commands and database names](https://docs.databricks.com/release-notes/product/2018/february.html#autocomplete-for-sql-commands-and-database-names)\n+ [Serverless pools now support R](https://docs.databricks.com/release-notes/product/2018/february.html#serverless-pools-now-support-r)\n+ [Distributed TensorFlow and Keras Libraries Support](https://docs.databricks.com/release-notes/product/2018/february.html#distributed-tensorflow-and-keras-libraries-support)\n+ [XGBoost available as a Spark Package](https://docs.databricks.com/release-notes/product/2018/february.html#xgboost-available-as-a-spark-package)\n+ [Table access control for SQL and Python (Beta)](https://docs.databricks.com/release-notes/product/2018/february.html#table-access-control-for-sql-and-python-beta)\n* [January 2018](https://docs.databricks.com/release-notes/product/2018/january.html)\n+ [Mount points for Azure Blob storage containers and Data Lake Stores](https://docs.databricks.com/release-notes/product/2018/january.html#mount-points-for-azure-blob-storage-containers-and-data-lake-stores)\n+ [Table Access Control for SQL and Python (Private Preview)](https://docs.databricks.com/release-notes/product/2018/january.html#table-access-control-for-sql-and-python-private-preview)\n+ [Exporting notebook job run results via API](https://docs.databricks.com/release-notes/product/2018/january.html#exporting-notebook-job-run-results-via-api)\n+ [Apache Airflow 1.9.0 includes Databricks integration](https://docs.databricks.com/release-notes/product/2018/january.html#apache-airflow-190-includes-databricks-integration)\n\n", "chunk_id": "3c7d1aad122c45b35c16a852e6cc35b5", "url": "https://docs.databricks.com/release-notes/product/index.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `schema_of_json_agg` aggregate function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 13.2 and later \nReturns the combined schema of all JSON strings in a group in DDL format.\n\n####### `schema_of_json_agg` aggregate function\n######## Syntax\n\n```\nschema_of_json_agg(jsonStr [, options] ) [FILTER ( WHERE cond ) ]\n\n``` \nThis function can also be invoked as a [window function](https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html) using the `OVER` clause.\n\n####### `schema_of_json_agg` aggregate function\n######## Arguments\n\n* `jsonStr`: A `STRING` literal with `JSON`.\n* `options`: An optional `MAP` literal with keys and values being `STRING`. For details on options, see [from\\_json function](https://docs.databricks.com/sql/language-manual/functions/from_json.html).\n* `cond`: An optional `BOOLEAN` expression filtering the rows used for aggregation.\n\n", "chunk_id": "7ff6c6edf9a27b3a5a3e257c98f2466f", "url": "https://docs.databricks.com/sql/language-manual/functions/schema_of_json_agg.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `schema_of_json_agg` aggregate function\n######## Returns\n\nA `STRING` holding a definition of an array of structs with `n` fields of strings where the column names are derived from the distinct set of `JSON` keys .\nThe field values hold the derived formatted SQL types. \nThe schema of each record is merged together by field name.\nWhen two fields with the same name have a different type across records, Databricks uses the [least common type](https://docs.databricks.com/sql/language-manual/sql-ref-datatype-rules.html#least-common-type-resolution).\nWhen no such type exists, the type is derived as a `STRING`.\nFor example, `INT` and `DOUBLE` become `DOUBLE`, while `STRUCT` and `STRING` become `STRING`. \nThe schema obtained from reading a column of `JSON` data is the same as the one derived from the following. \n```\nSELECT * FROM json.`/my/data`;\n\n``` \nTo derive the schema of a single `JSON` string, use [schema\\_of\\_json function](https://docs.databricks.com/sql/language-manual/functions/schema_of_json.html).\n\n", "chunk_id": "b453b9d5463f595f838b0386f0c05777", "url": "https://docs.databricks.com/sql/language-manual/functions/schema_of_json_agg.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `schema_of_json_agg` aggregate function\n######## Examples\n\n```\n> SELECT schema_of_json_agg(a) FROM VALUES('{\"foo\": \"bar\"}') AS data(a);\nSTRUCT\n\n> SELECT schema_of_json_agg(a) FROM VALUES('[1]') AS data(a);\nARRAY\n\n> CREATE TEMPORARY VIEW data(a) AS VALUES\n('{\"foo\": \"bar\", \"wing\": {\"ding\": \"dong\"}}'),\n('{\"top\": \"level\", \"wing\": {\"stop\": \"go\"}}')\n\n> SELECT schema_of_json_agg(a) FROM data;\nSTRUCT>\n\n```\n\n####### `schema_of_json_agg` aggregate function\n######## Related functions\n\n* [from\\_json function](https://docs.databricks.com/sql/language-manual/functions/from_json.html)\n* [schema\\_of\\_json function](https://docs.databricks.com/sql/language-manual/functions/schema_of_json.html)\n\n", "chunk_id": "f084a363c9989c5c0e54d0b83053b6c8", "url": "https://docs.databricks.com/sql/language-manual/functions/schema_of_json_agg.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `cloud_files_state` table-valued function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 11.3 LTS and above \nReturns the file-level state of an Auto Loader or `read_files` stream.\n\n####### `cloud_files_state` table-valued function\n######## Syntax\n\n```\ncloud_files_state( { TABLE ( table_name ) | checkpoint } )\n\n```\n\n####### `cloud_files_state` table-valued function\n######## Arguments\n\n* [table\\_name](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#table-name): The identifier of the [streaming table](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-streaming-table.html) that\u2019s being written to by `read_files`. The name must not include a temporal specification. Available in Databricks Runtime 13.3 LTS and above.\n* `checkpoint`: A `STRING` literal. The checkpoint directory for a stream using the Auto Loader source. See [What is Auto Loader?](https://docs.databricks.com/ingestion/auto-loader/index.html).\n\n", "chunk_id": "0e777efb2d712040d5f73405223e3417", "url": "https://docs.databricks.com/sql/language-manual/functions/cloud_files_state.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `cloud_files_state` table-valued function\n######## Returns\n\nReturns a table with the following schema: \n* `path STRING NOT NULL PRIMARY KEY` \nThe path of a file.\n* `size BIGINT NOT NULL` \nThe size of a file in bytes.\n* `create_time TIMESTAMP NOT NULL` \nThe time that a file was created.\n* `discovery_time TIMESTAMP NOT NULL` \nPreview \nThis feature is in [Private Preview](https://docs.databricks.com/release-notes/release-types.html). To try it, reach out to your Databricks contact. \nThe time that a file was discovered.\n* `commit_time TIMESTAMP` \nPreview \nThis feature is in [Private Preview](https://docs.databricks.com/release-notes/release-types.html). To try it, reach out to your Databricks contact. \nThe time that a file was committed to the checkpoint after processing.\n`NULL` if the file is not yet processed. A file might be processed, but might be\nmarked as committed arbitrarily later. Marking the file as committed means that\nAuto Loader does not require the file for processing again.\n* `archive_time TIMESTAMP` \nPreview \nThis feature is in [Private Preview](https://docs.databricks.com/release-notes/release-types.html). To try it, reach out to your Databricks contact. \nThe time that a file was archived. `NULL` if the file has not been archived.\n* `source_id STRING` \nThe ID of the Auto Loader source in the streaming query. This value is `'0'` for streams that ingest from a\nsingle cloud object store location.\n\n####### `cloud_files_state` table-valued function\n######## Permissions\n\nYou need to have: \n* `OWNER` privileges on the streaming table if using a streaming table identifier.\n* `READ FILES` privileges on the checkpoint location if providing a checkpoint under an [external location](https://docs.databricks.com/connect/unity-catalog/external-locations.html).\n\n", "chunk_id": "f05939cf60b66e9fd604a99ddfb1af3e", "url": "https://docs.databricks.com/sql/language-manual/functions/cloud_files_state.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `cloud_files_state` table-valued function\n######## Examples\n\n```\n-- Simple example from checkpoint\n> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint');\n/some/input/path\n/other/input/path\n\n-- Simple example from source subdir\n> SELECT path FROM CLOUD_FILES_STATE('/some/checkpoint/sources/0');\n/some/input/path\n/other/input/path\n\n-- Simple example from streaming table\n> SELECT path FROM CLOUD_FILES_STATE(TABLE(my_streaming_table));\n/some/input/path\n/other/input/path\n\n```\n\n####### `cloud_files_state` table-valued function\n######## Related articles\n\n* [Load data with Delta Live Tables](https://docs.databricks.com/delta-live-tables/load.html)\n* [read\\_files table-valued function](https://docs.databricks.com/sql/language-manual/functions/read_files.html)\n\n", "chunk_id": "afb54bbe47023762bcd1d17c5093ea24", "url": "https://docs.databricks.com/sql/language-manual/functions/cloud_files_state.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `ai_mask` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL \nPreview \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \nIn the preview, \n* The underlying language model can handle several languages, however these functions are tuned for English.\n* There is rate limiting for the underlying Foundation Model APIs, see [Foundation Model APIs limits](https://docs.databricks.com/machine-learning/model-serving/model-serving-limits.html#fmapi-limits) to update these limits. \nThe `ai_mask()` function allows you to invoke a state-of-the-art generative AI model to mask specified entities in a given text using SQL. This function uses a chat model serving endpoint made available by [Databricks Foundation Model APIs](https://docs.databricks.com/machine-learning/foundation-models/index.html).\n\n####### `ai_mask` function\n######## Requirements\n\nImportant \nThe underlying models that might be used at this time are licensed under the Apache 2.0 license or Llama 2 community license. Databricks recommends reviewing these licenses to ensure compliance with any applicable terms. If models emerge in the future that perform better according to Databricks\u2019s internal benchmarks, Databricks may change the model (and the list of applicable licenses provided on this page). \nCurrently, **Mixtral-8x7B Instruct** is the underlying model that powers these AI functions. \n* This function is only available on workspaces in Foundation Model APIs [pay-per-token supported regions](https://docs.databricks.com/machine-learning/foundation-models/index.html#required).\n* This function is not available on Databricks SQL Classic.\n* Check the [Databricks SQL pricing page](https://www.databricks.com/product/pricing/databricks-sql).\n\n", "chunk_id": "4a66b5a72bd15fa7265f7113972dce6c", "url": "https://docs.databricks.com/sql/language-manual/functions/ai_mask.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `ai_mask` function\n######## Syntax\n\n```\nai_mask(content, labels)\n\n```\n\n####### `ai_mask` function\n######## Arguments\n\n* `content`: A `STRING` expression.\n* `labels`: An `ARRAY` literal. Each element represents a type of information to be masked.\n\n####### `ai_mask` function\n######## Returns\n\nA `STRING` where the specified information is masked. \nIf `content` is `NULL`, the result is `NULL`.\n\n####### `ai_mask` function\n######## Examples\n\n```\n> SELECT ai_mask(\n'John Doe lives in New York. His email is john.doe@example.com.',\narray('person', 'email')\n);\n\"[MASKED] lives in New York. His email is [MASKED].\"\n\n> SELECT ai_mask(\n'Contact me at 555-1234 or visit us at 123 Main St.',\narray('phone', 'address')\n);\n\"Contact me at [MASKED] or visit us at [MASKED]\"\n\n```\n\n####### `ai_mask` function\n######## Related functions\n\n* [ai\\_extract function](https://docs.databricks.com/sql/language-manual/functions/ai_extract.html)\n\n", "chunk_id": "f6eec70c066e294e261bf6390ab59266", "url": "https://docs.databricks.com/sql/language-manual/functions/ai_mask.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `xpath_string` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns the contents of the first XML node that matches the XPath expression.\n\n####### `xpath_string` function\n######## Syntax\n\n```\nxpath_string(xml, xpath)\n\n```\n\n####### `xpath_string` function\n######## Arguments\n\n* `xml`: A STRING expression of XML.\n* `xpath`: A STRING expression that is a well formed XPath.\n\n####### `xpath_string` function\n######## Returns\n\nThe result is STRING. \nThe function raises an error if `xml` or `xpath` are malformed.\n\n####### `xpath_string` function\n######## Examples\n\n```\n> SELECT xpath_string('bcc','a/c');\ncc\n\n```\n\n", "chunk_id": "dcef98f0fd5965dddad1513ed93ca142", "url": "https://docs.databricks.com/sql/language-manual/functions/xpath_string.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `xpath_string` function\n######## Related functions\n\n* [xpath function](https://docs.databricks.com/sql/language-manual/functions/xpath.html)\n* [xpath\\_boolean function](https://docs.databricks.com/sql/language-manual/functions/xpath_boolean.html)\n* [xpath\\_double function](https://docs.databricks.com/sql/language-manual/functions/xpath_double.html)\n* [xpath\\_float function](https://docs.databricks.com/sql/language-manual/functions/xpath_float.html)\n* [xpath\\_long function](https://docs.databricks.com/sql/language-manual/functions/xpath_long.html)\n* [xpath\\_number function](https://docs.databricks.com/sql/language-manual/functions/xpath_number.html)\n* [xpath\\_int function](https://docs.databricks.com/sql/language-manual/functions/xpath_int.html)\n* [xpath\\_short function](https://docs.databricks.com/sql/language-manual/functions/xpath_short.html)\n\n", "chunk_id": "6a6d6bf59ee60f883ce3df4f80a9a754", "url": "https://docs.databricks.com/sql/language-manual/functions/xpath_string.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n\nThis article provides details for the Delta Live Tables Python programming interface. \nFor information on the SQL API, see the [Delta Live Tables SQL language reference](https://docs.databricks.com/delta-live-tables/sql-ref.html). \nFor details specific to configuring Auto Loader, see [What is Auto Loader?](https://docs.databricks.com/ingestion/auto-loader/index.html).\n\n##### Delta Live Tables Python language reference\n###### Limitations\n\nThe Delta Live Tables Python interface has the following limitations: \n* The Python `table` and `view` functions must return a DataFrame. Some functions that operate on DataFrames do not return DataFrames and should not be used. Because DataFrame transformations are executed *after* the full dataflow graph has been resolved, using such operations might have unintended side effects. These operations include functions such as `collect()`, `count()`, `toPandas()`, `save()`, and `saveAsTable()`. However, you can include these functions outside of `table` or `view` function definitions because this code is run once during the graph initialization phase.\n* The `pivot()` function is not supported. The `pivot` operation in Spark requires eager loading of input data to compute the schema of the output. This capability is not supported in Delta Live Tables.\n\n##### Delta Live Tables Python language reference\n###### Import the `dlt` Python module\n\nDelta Live Tables Python functions are defined in the `dlt` module. Your pipelines implemented with the Python API must import this module: \n```\nimport dlt\n\n```\n\n", "chunk_id": "b6557343a829e116f47631ce06494027", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Create a Delta Live Tables materialized view or streaming table\n\nIn Python, Delta Live Tables determines whether to update a dataset as a materialized view or streaming table based on the defining query. The `@table` decorator is used to define both materialized views and streaming tables. \nTo define a materialized view in Python, apply `@table` to a query that performs a static read against a data source. To define a streaming table, apply `@table` to a query that performs a streaming read against a data source. Both dataset types have the same syntax specification as follows: \n```\nimport dlt\n\n@dlt.table(\nname=\"\",\ncomment=\"\",\nspark_conf={\"\" : \"\", \"\" : \"\"},\ntable_properties={\"\" : \"\", \"\" : \"\"},\npath=\"\",\npartition_cols=[\"\", \"\"],\nschema=\"schema-definition\",\ntemporary=False)\n@dlt.expect\n@dlt.expect_or_fail\n@dlt.expect_or_drop\n@dlt.expect_all\n@dlt.expect_all_or_drop\n@dlt.expect_all_or_fail\ndef ():\nreturn ()\n\n```\n\n", "chunk_id": "327158e616c03eb1d8fec377a069e0d1", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Create a Delta Live Tables view\n\nTo define a view in Python, apply the `@view` decorator. Like the `@table` decorator, you can use views in Delta Live Tables for either static or streaming datasets. The following is the syntax for defining views with Python: \n```\nimport dlt\n\n@dlt.view(\nname=\"\",\ncomment=\"\")\n@dlt.expect\n@dlt.expect_or_fail\n@dlt.expect_or_drop\n@dlt.expect_all\n@dlt.expect_all_or_drop\n@dlt.expect_all_or_fail\ndef ():\nreturn ()\n\n```\n\n##### Delta Live Tables Python language reference\n###### Example: Define tables and views\n\nTo define a table or view in Python, apply the `@dlt.view` or `@dlt.table` decorator to a function. You can use the function name or the `name` parameter to assign the table or view name. The following example defines two different datasets: a view called `taxi_raw` that takes a JSON file as the input source and a table called `filtered_data` that takes the `taxi_raw` view as input: \n```\nimport dlt\n\n@dlt.view\ndef taxi_raw():\nreturn spark.read.format(\"json\").load(\"/databricks-datasets/nyctaxi/sample/json/\")\n\n# Use the function name as the table name\n@dlt.table\ndef filtered_data():\nreturn dlt.read(\"taxi_raw\").where(...)\n\n# Use the name parameter as the table name\n@dlt.table(\nname=\"filtered_data\")\ndef create_filtered_data():\nreturn dlt.read(\"taxi_raw\").where(...)\n\n```\n\n", "chunk_id": "dac9f139d02bf772d18f897918b58ce4", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Example: Access a dataset defined in the same pipeline\n\nIn addition to reading from external data sources, you can access datasets defined in the same pipeline with the Delta Live Tables `read()` function. The following example demonstrates creating a `customers_filtered` dataset using the `read()` function: \n```\n@dlt.table\ndef customers_raw():\nreturn spark.read.format(\"csv\").load(\"/data/customers.csv\")\n\n@dlt.table\ndef customers_filteredA():\nreturn dlt.read(\"customers_raw\").where(...)\n\n``` \nYou can also use the `spark.table()` function to access a dataset defined in the same pipeline. When using the `spark.table()` function to access a dataset defined in the pipeline, in the function argument prepend the `LIVE` keyword to the dataset name: \n```\n@dlt.table\ndef customers_raw():\nreturn spark.read.format(\"csv\").load(\"/data/customers.csv\")\n\n@dlt.table\ndef customers_filteredB():\nreturn spark.table(\"LIVE.customers_raw\").where(...)\n\n```\n\n##### Delta Live Tables Python language reference\n###### Example: Read from a table registered in a metastore\n\nTo read data from a table registered in the Hive metastore, in the function argument omit the `LIVE` keyword and optionally qualify the table name with the database name: \n```\n@dlt.table\ndef customers():\nreturn spark.table(\"sales.customers\").where(...)\n\n``` \nFor an example of reading from a Unity Catalog table, see [Ingest data into a Unity Catalog pipeline](https://docs.databricks.com/delta-live-tables/unity-catalog.html#ingest-data).\n\n", "chunk_id": "436bd5efff0c7af99987f07e00a3c075", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Example: Access a dataset using `spark.sql`\n\nYou can also return a dataset using a `spark.sql` expression in a query function. To read from an internal dataset, prepend `LIVE.` to the dataset name: \n```\n@dlt.table\ndef chicago_customers():\nreturn spark.sql(\"SELECT * FROM LIVE.customers_cleaned WHERE city = 'Chicago'\")\n\n```\n\n", "chunk_id": "36830404cdf9403f93c56fa34215546c", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Create a table to use as the target of streaming operations\n\nUse the `create_streaming_table()` function to create a target table for records output by streaming operations, including [apply\\_changes()](https://docs.databricks.com/delta-live-tables/python-ref.html#cdc) and [@append\\_flow](https://docs.databricks.com/delta-live-tables/flows.html#append-flows) output records. \nNote \nThe `create_target_table()` and `create_streaming_live_table()` functions are deprecated. Databricks recommends updating existing code to use the `create_streaming_table()` function. \n```\ncreate_streaming_table(\nname = \"\",\ncomment = \"\"\nspark_conf={\"\" : \"\"},\ntable_properties={\"\" : \"\", \"\" : \"\"},\npartition_cols=[\"\", \"\"],\npath=\"\",\nschema=\"schema-definition\",\nexpect_all = {\"\" : \"\"},\nexpect_all_or_drop = {\"\" : \"\"},\nexpect_all_or_fail = {\"\" : \"\"}\n)\n\n``` \n| Arguments |\n| --- |\n| **`name`** Type: `str` The table name. This parameter is required. |\n| **`comment`** Type: `str` An optional description for the table. |\n| **`spark_conf`** Type: `dict` An optional list of Spark configurations for the execution of this query. |\n| **`table_properties`** Type: `dict` An optional list of [table properties](https://docs.databricks.com/delta-live-tables/properties.html) for the table. |\n| **`partition_cols`** Type: `array` An optional list of one or more columns to use for partitioning the table. |\n| **`path`** Type: `str` An optional storage location for table data. If not set, the system will default to the pipeline storage location. |\n| **`schema`** Type: `str` or `StructType` An optional schema definition for the table. Schemas can be defined as a SQL DDL string, or with a Python `StructType`. |\n| **`expect_all`** **`expect_all_or_drop`** **`expect_all_or_fail`** Type: `dict` Optional data quality constraints for the table. See [multiple expectations](https://docs.databricks.com/delta-live-tables/expectations.html#expect-all). |\n\n", "chunk_id": "5004117a48c6dbb98757955e49b9fbc5", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Control how tables are materialized\n\nTables also offer additional control of their materialization: \n* Specify how tables are [partitioned](https://docs.databricks.com/delta-live-tables/python-ref.html#schema-partition-example) using `partition_cols`. You can use partitioning to speed up queries.\n* You can set table properties when you define a view or table. See [Delta Live Tables table properties](https://docs.databricks.com/delta-live-tables/properties.html#table-properties).\n* Set a storage location for table data using the `path` setting. By default, table data is stored in the pipeline storage location if `path` isn\u2019t set.\n* You can use [generated columns](https://docs.databricks.com/delta/generated-columns.html) in your schema definition. See [Example: Specify a schema and partition columns](https://docs.databricks.com/delta-live-tables/python-ref.html#schema-partition-example). \nNote \nFor tables less than 1 TB in size, Databricks recommends letting Delta Live Tables control data organization. Unless you expect your table to grow beyond a terabyte, you should generally not specify partition columns.\n\n", "chunk_id": "cb1bc833aa0f22c9996935fd4da6b425", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Example: Specify a schema and partition columns\n\nYou can optionally specify a table schema using a Python `StructType` or a SQL DDL string. When specified with a DDL string, the definition can include [generated columns](https://docs.databricks.com/delta/generated-columns.html). \nThe following example creates a table called `sales` with a schema specified using a Python `StructType`: \n```\nsales_schema = StructType([\nStructField(\"customer_id\", StringType(), True),\nStructField(\"customer_name\", StringType(), True),\nStructField(\"number_of_line_items\", StringType(), True),\nStructField(\"order_datetime\", StringType(), True),\nStructField(\"order_number\", LongType(), True)]\n)\n\n@dlt.table(\ncomment=\"Raw data on sales\",\nschema=sales_schema)\ndef sales():\nreturn (\"...\")\n\n``` \nThe following example specifies the schema for a table using a DDL string, defines a generated column, and defines a partition column: \n```\n@dlt.table(\ncomment=\"Raw data on sales\",\nschema=\"\"\"\ncustomer_id STRING,\ncustomer_name STRING,\nnumber_of_line_items STRING,\norder_datetime STRING,\norder_number LONG,\norder_day_of_week STRING GENERATED ALWAYS AS (dayofweek(order_datetime))\n\"\"\",\npartition_cols = [\"order_day_of_week\"])\ndef sales():\nreturn (\"...\")\n\n``` \nBy default, Delta Live Tables infers the schema from the `table` definition if you don\u2019t specify a schema.\n\n", "chunk_id": "1edaccc053cb0cafa897b1ef1e3860d2", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Configure a streaming table to ignore changes in a source streaming table\n\nNote \n* The `skipChangeCommits` flag works only with `spark.readStream` using the `option()` function. You cannot use this flag in a `dlt.read_stream()` function.\n* You cannot use the `skipChangeCommits` flag when the source streaming table is defined as the target of an [apply\\_changes()](https://docs.databricks.com/delta-live-tables/python-ref.html#cdc) function. \nBy default, streaming tables require append-only sources. When a streaming table uses another streaming table as a source, and the source streaming table requires updates or deletes, for example, GDPR \u201cright to be forgotten\u201d processing, the `skipChangeCommits` flag can be set when reading the source streaming table to ignore those changes. For more information about this flag, see [Ignore updates and deletes](https://docs.databricks.com/structured-streaming/delta-lake.html#ignore-changes). \n```\n@table\ndef b():\nreturn spark.readStream.option(\"skipChangeCommits\", \"true\").table(\"LIVE.A\")\n\n```\n\n", "chunk_id": "f51a8ded29879bdd7aabf15febaafafe", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Python Delta Live Tables properties\n\nThe following tables describe the options and properties you can specify while defining tables and views with Delta Live Tables: \n| @table or @view |\n| --- |\n| **`name`** Type: `str` An optional name for the table or view. If not defined, the function name is used as the table or view name. |\n| **`comment`** Type: `str` An optional description for the table. |\n| **`spark_conf`** Type: `dict` An optional list of Spark configurations for the execution of this query. |\n| **`table_properties`** Type: `dict` An optional list of [table properties](https://docs.databricks.com/delta-live-tables/properties.html) for the table. |\n| **`path`** Type: `str` An optional storage location for table data. If not set, the system will default to the pipeline storage location. |\n| **`partition_cols`** Type: `a collection of str` An optional collection, for example, a `list`, of one or more columns to use for partitioning the table. |\n| **`schema`** Type: `str` or `StructType` An optional schema definition for the table. Schemas can be defined as a SQL DDL string, or with a Python `StructType`. |\n| **`temporary`** Type: `bool` Create a table but do not publish metadata for the table. The `temporary` keyword instructs Delta Live Tables to create a table that is available to the pipeline but should not be accessed outside the pipeline. To reduce processing time, a temporary table persists for the lifetime of the pipeline that creates it, and not just a single update. The default is \u2018False\u2019. | \n| Table or view definition |\n| --- |\n| **`def ()`** A Python function that defines the dataset. If the `name` parameter is not set, then `` is used as the target dataset name. |\n| **`query`** A Spark SQL statement that returns a Spark Dataset or Koalas DataFrame. Use `dlt.read()` or `spark.table()` to perform a complete read from a dataset defined in the same pipeline. When using the `spark.table()` function to read from a dataset defined in the same pipeline, prepend the `LIVE` keyword to the dataset name in the function argument. For example, to read from a dataset named `customers`: `spark.table(\"LIVE.customers\")` You can also use the `spark.table()` function to read from a table registered in the metastore by omitting the `LIVE` keyword and optionally qualifying the table name with the database name: `spark.table(\"sales.customers\")` Use `dlt.read_stream()` to perform a streaming read from a dataset defined in the same pipeline. Use the `spark.sql` function to define a SQL query to create the return dataset. Use [PySpark](https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/dataframe.html) syntax to define Delta Live Tables queries with Python. | \n| Expectations |\n| --- |\n| **`@expect(\"description\", \"constraint\")`** Declare a data quality constraint identified by `description`. If a row violates the expectation, include the row in the target dataset. |\n| **`@expect_or_drop(\"description\", \"constraint\")`** Declare a data quality constraint identified by `description`. If a row violates the expectation, drop the row from the target dataset. |\n| **`@expect_or_fail(\"description\", \"constraint\")`** Declare a data quality constraint identified by `description`. If a row violates the expectation, immediately stop execution. |\n| **`@expect_all(expectations)`** Declare one or more data quality constraints. `expectations` is a Python dictionary, where the key is the expectation description and the value is the expectation constraint. If a row violates any of the expectations, include the row in the target dataset. |\n| **`@expect_all_or_drop(expectations)`** Declare one or more data quality constraints. `expectations` is a Python dictionary, where the key is the expectation description and the value is the expectation constraint. If a row violates any of the expectations, drop the row from the target dataset. |\n| **`@expect_all_or_fail(expectations)`** Declare one or more data quality constraints. `expectations` is a Python dictionary, where the key is the expectation description and the value is the expectation constraint. If a row violates any of the expectations, immediately stop execution. |\n\n", "chunk_id": "326be10aab5099fc586e2e1a11ae8d87", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# Databricks data engineering\n## What is Delta Live Tables?\n### Delta Live Tables language references\n##### Delta Live Tables Python language reference\n###### Change data capture with Python in Delta Live Tables\n\nUse the `apply_changes()` function in the Python API to use Delta Live Tables CDC functionality. The Delta Live Tables Python interface also provides the [create\\_streaming\\_table()](https://docs.databricks.com/delta-live-tables/python-ref.html#create-target-fn) function. You can use this function to create the target table required by the `apply_changes()` function. \n```\napply_changes(\ntarget = \"\",\nsource = \"\",\nkeys = [\"key1\", \"key2\", \"keyN\"],\nsequence_by = \"\",\nignore_null_updates = False,\napply_as_deletes = None,\napply_as_truncates = None,\ncolumn_list = None,\nexcept_column_list = None,\nstored_as_scd_type = ,\ntrack_history_column_list = None,\ntrack_history_except_column_list = None\n)\n\n``` \nNote \nThe default behavior for `INSERT` and `UPDATE` events is to *upsert* CDC events from the source: update any rows in the target table that match the specified key(s) or insert a new row when a matching record does not exist in the target table. Handling for `DELETE` events can be specified with the `APPLY AS DELETE WHEN` condition. \nImportant \nYou must declare a target streaming table to apply changes into. You can optionally specify the schema for your target table. When specifying the schema of the `apply_changes` target table, you must also include the `__START_AT` and `__END_AT` columns with the same data type as the `sequence_by` field. \nSee [APPLY CHANGES API: Simplify change data capture in Delta Live Tables](https://docs.databricks.com/delta-live-tables/cdc.html). \n| Arguments |\n| --- |\n| **`target`** Type: `str` The name of the table to be updated. You can use the [create\\_streaming\\_table()](https://docs.databricks.com/delta-live-tables/python-ref.html#create-target-fn) function to create the target table before executing the `apply_changes()` function. This parameter is required. |\n| **`source`** Type: `str` The data source containing CDC records. This parameter is required. |\n| **`keys`** Type: `list` The column or combination of columns that uniquely identify a row in the source data. This is used to identify which CDC events apply to specific records in the target table. You can specify either:* A list of strings: `[\"userId\", \"orderId\"]` * A list of Spark SQL `col()` functions: `[col(\"userId\"), col(\"orderId\"]` Arguments to `col()` functions cannot include qualifiers. For example, you can use `col(userId)`, but you cannot use `col(source.userId)`. This parameter is required. |\n| **`sequence_by`** Type: `str` or `col()` The column name specifying the logical order of CDC events in the source data. Delta Live Tables uses this sequencing to handle change events that arrive out of order. You can specify either:* A string: `\"sequenceNum\"` * A Spark SQL `col()` function: `col(\"sequenceNum\")` Arguments to `col()` functions cannot include qualifiers. For example, you can use `col(userId)`, but you cannot use `col(source.userId)`. This parameter is required. |\n| **`ignore_null_updates`** Type: `bool` Allow ingesting updates containing a subset of the target columns. When a CDC event matches an existing row and `ignore_null_updates` is `True`, columns with a `null` will retain their existing values in the target. This also applies to nested columns with a value of `null`. When `ignore_null_updates` is `False`, existing values will be overwritten with `null` values. This parameter is optional. The default is `False`. |\n| **`apply_as_deletes`** Type: `str` or `expr()` Specifies when a CDC event should be treated as a `DELETE` rather than an upsert. To handle out-of-order data, the deleted row is temporarily retained as a tombstone in the underlying Delta table, and a view is created in the metastore that filters out these tombstones. The retention interval can be configured with the `pipelines.cdc.tombstoneGCThresholdInSeconds` [table property](https://docs.databricks.com/delta-live-tables/properties.html#table-properties). You can specify either:* A string: `\"Operation = 'DELETE'\"` * A Spark SQL `expr()` function: `expr(\"Operation = 'DELETE'\")` This parameter is optional. |\n| **`apply_as_truncates`** Type: `str` or `expr()` Specifies when a CDC event should be treated as a full table `TRUNCATE`. Because this clause triggers a full truncate of the target table, it should be used only for specific use cases requiring this functionality. The `apply_as_truncates` parameter is supported only for SCD type 1. SCD type 2 does not support truncate. You can specify either:* A string: `\"Operation = 'TRUNCATE'\"` * A Spark SQL `expr()` function: `expr(\"Operation = 'TRUNCATE'\")` This parameter is optional. |\n| **`column_list`** **`except_column_list`** Type: `list` A subset of columns to include in the target table. Use `column_list` to specify the complete list of columns to include. Use `except_column_list` to specify the columns to exclude. You can declare either value as a list of strings or as Spark SQL `col()` functions:* `column_list = [\"userId\", \"name\", \"city\"]`. * `column_list = [col(\"userId\"), col(\"name\"), col(\"city\")]` * `except_column_list = [\"operation\", \"sequenceNum\"]` * `except_column_list = [col(\"operation\"), col(\"sequenceNum\")` Arguments to `col()` functions cannot include qualifiers. For example, you can use `col(userId)`, but you cannot use `col(source.userId)`. This parameter is optional. The default is to include all columns in the target table when no `column_list` or `except_column_list` argument is passed to the function. |\n| **`stored_as_scd_type`** Type: `str` or `int` Whether to store records as SCD type 1 or SCD type 2. Set to `1` for SCD type 1 or `2` for SCD type 2. This clause is optional. The default is SCD type 1. |\n| **`track_history_column_list`** **`track_history_except_column_list`** Type: `list` A subset of output columns to be tracked for history in the target table. Use `track_history_column_list` to specify the complete list of columns to be tracked. Use `track_history_except_column_list` to specify the columns to be excluded from tracking. You can declare either value as a list of strings or as Spark SQL `col()` functions: - `track_history_column_list = [\"userId\", \"name\", \"city\"]`. - `track_history_column_list = [col(\"userId\"), col(\"name\"), col(\"city\")]` - `track_history_except_column_list = [\"operation\", \"sequenceNum\"]` - `track_history_except_column_list = [col(\"operation\"), col(\"sequenceNum\")` Arguments to `col()` functions cannot include qualifiers. For example, you can use `col(userId)`, but you cannot use `col(source.userId)`. This parameter is optional. The default is to include all columns in the target table when no `track_history_column_list` or `track_history_except_column_list` argument is passed to the function. |\n\n", "chunk_id": "5e2be68bf9880e2e2b61f8fb4132c529", "url": "https://docs.databricks.com/delta-live-tables/python-ref.html"} +{"chunked_text": "# What is Delta Lake?\n### Use Delta Lake change data feed on Databricks\n\nChange data feed allows Databricks to track row-level changes between versions of a Delta table. When enabled on a Delta table, the runtime records *change events* for all the data written into the table. This includes the row data along with metadata indicating whether the specified row was inserted, deleted, or updated. \nImportant \nChange data feed works in tandem with table history to provide change information. Because cloning a Delta table creates a separate history, the change data feed on cloned tables doesn\u2019t match that of the original table.\n\n", "chunk_id": "338c3a2cd0ce80d16ae3f88a2d9f6431", "url": "https://docs.databricks.com/delta/delta-change-data-feed.html"} +{"chunked_text": "# What is Delta Lake?\n### Use Delta Lake change data feed on Databricks\n#### Incrementally process change data\n\nDatabricks recommends using change data feed in combination with Structured Streaming to incrementally process changes from Delta tables. You must use Structured Streaming for Databricks to automatically track versions for your table\u2019s change data feed. \nNote \nDelta Live Tables provides functionality for easy propagation of change data and storing results as SCD (slowly changing dimension) type 1 or type 2 tables. See [APPLY CHANGES API: Simplify change data capture in Delta Live Tables](https://docs.databricks.com/delta-live-tables/cdc.html). \nTo read the change data feed from a table, you must enable change data feed on that table. See [Enable change data feed](https://docs.databricks.com/delta/delta-change-data-feed.html#enable). \nSet the option `readChangeFeed` to `true` when configuring a stream against a table to read the change data feed, as shown in the following syntax example: \n```\n(spark.readStream.format(\"delta\")\n.option(\"readChangeFeed\", \"true\")\n.table(\"myDeltaTable\")\n)\n\n``` \n```\nspark.readStream.format(\"delta\")\n.option(\"readChangeFeed\", \"true\")\n.table(\"myDeltaTable\")\n\n``` \nBy default, the stream returns the latest snapshot of the table when the stream first starts as an `INSERT` and future changes as change data. \nChange data commits as part of the Delta Lake transaction, and becomes available at the same time the new data commits to the table. \nYou can optionally specify a starting version. See [Should I specify a starting version?](https://docs.databricks.com/delta/delta-change-data-feed.html#version-options). \nChange data feed also supports batch execution, which requires specifying a starting version. See [Read changes in batch queries](https://docs.databricks.com/delta/delta-change-data-feed.html#batch). \nOptions like rate limits (`maxFilesPerTrigger`, `maxBytesPerTrigger`) and `excludeRegex` are also supported when reading change data. \nRate limiting can be atomic for versions other than the starting snapshot version. That is, the entire commit version will be rate limited or the entire commit will be returned.\n\n", "chunk_id": "5648963c91f331eab99d51f4cf98c3a2", "url": "https://docs.databricks.com/delta/delta-change-data-feed.html"} +{"chunked_text": "# What is Delta Lake?\n### Use Delta Lake change data feed on Databricks\n#### Should I specify a starting version?\n\nYou can optionally specify a starting version if you want to ignore changes that happened before a particular version. You can specify a version using a timestamp or the version ID number recorded in the Delta transaction log. \nNote \nA starting version is required for batch reads, and many batch patterns can benefit from setting an optional ending version. \nWhen you\u2019re configuring Structured Streaming workloads involving change data feed, it\u2019s important to understand how specifying a starting version impacts processing. \nMany streaming workloads, especially new data processing pipelines, benefit from the default behavior. With the default behavior, the first batch is processed when the stream first records all existing records in the table as `INSERT` operations in the change data feed. \nIf your target table already contains all the records with appropriate changes up to a certain point, specify a starting version to avoid processing the source table state as `INSERT` events. \nThe following example syntax recovering from a streaming failure in which the checkpoint was corrupted. In this example, assume the following conditions: \n1. Change data feed was enabled on the source table at table creation.\n2. The target downstream table has processed all changes up to and including version 75.\n3. Version history for the source table is available for versions 70 and above. \n```\n(spark.readStream.format(\"delta\")\n.option(\"readChangeFeed\", \"true\")\n.option(\"startingVersion\", 76)\n.table(\"source_table\")\n)\n\n``` \n```\nspark.readStream.format(\"delta\")\n.option(\"readChangeFeed\", \"true\")\n.option(\"startingVersion\", 76)\n.table(\"source_table\")\n\n``` \nIn this example, you must also specify a new checkpoint location. \nImportant \nIf you specify a starting version, the stream fails to start from a new checkpoint if the starting version is no longer present in the table history. Delta Lake cleans up historic versions automatically, meaning that all specified starting versions are eventually deleted. \nSee [Can I use change data feed to replay the entire history of a table?](https://docs.databricks.com/delta/delta-change-data-feed.html#replay).\n\n", "chunk_id": "54dfb305daf30024a20fc146f50a224f", "url": "https://docs.databricks.com/delta/delta-change-data-feed.html"} +{"chunked_text": "# What is Delta Lake?\n### Use Delta Lake change data feed on Databricks\n#### Read changes in batch queries\n\nYou can use batch query syntax to read all changes starting from a particular version or to read changes within a specified range of versions. \nYou specify a version as an integer and a timestamps as a string in the format `yyyy-MM-dd[ HH:mm:ss[.SSS]]`. \nThe start and end versions are inclusive in the queries. To read the changes from a particular start version to the latest version of the table, specify only the starting version. \nIf you provide a version lower or timestamp older than one that has recorded change events\u2014that is, when the change data feed was enabled\u2014an error is thrown indicating that the change data feed was not enabled. \nThe following syntax examples demonstrate using starting and ending version options with batch reads: \n```\n-- version as ints or longs e.g. changes from version 0 to 10\nSELECT * FROM table_changes('tableName', 0, 10)\n\n-- timestamp as string formatted timestamps\nSELECT * FROM table_changes('tableName', '2021-04-21 05:45:46', '2021-05-21 12:00:00')\n\n-- providing only the startingVersion/timestamp\nSELECT * FROM table_changes('tableName', 0)\n\n-- database/schema names inside the string for table name, with backticks for escaping dots and special characters\nSELECT * FROM table_changes('dbName.`dotted.tableName`', '2021-04-21 06:45:46' , '2021-05-21 12:00:00')\n\n``` \n```\n# version as ints or longs\nspark.read.format(\"delta\") \\\n.option(\"readChangeFeed\", \"true\") \\\n.option(\"startingVersion\", 0) \\\n.option(\"endingVersion\", 10) \\\n.table(\"myDeltaTable\")\n\n# timestamps as formatted timestamp\nspark.read.format(\"delta\") \\\n.option(\"readChangeFeed\", \"true\") \\\n.option(\"startingTimestamp\", '2021-04-21 05:45:46') \\\n.option(\"endingTimestamp\", '2021-05-21 12:00:00') \\\n.table(\"myDeltaTable\")\n\n# providing only the startingVersion/timestamp\nspark.read.format(\"delta\") \\\n.option(\"readChangeFeed\", \"true\") \\\n.option(\"startingVersion\", 0) \\\n.table(\"myDeltaTable\")\n\n``` \n```\n// version as ints or longs\nspark.read.format(\"delta\")\n.option(\"readChangeFeed\", \"true\")\n.option(\"startingVersion\", 0)\n.option(\"endingVersion\", 10)\n.table(\"myDeltaTable\")\n\n// timestamps as formatted timestamp\nspark.read.format(\"delta\")\n.option(\"readChangeFeed\", \"true\")\n.option(\"startingTimestamp\", \"2021-04-21 05:45:46\")\n.option(\"endingTimestamp\", \"2021-05-21 12:00:00\")\n.table(\"myDeltaTable\")\n\n// providing only the startingVersion/timestamp\nspark.read.format(\"delta\")\n.option(\"readChangeFeed\", \"true\")\n.option(\"startingVersion\", 0)\n.table(\"myDeltaTable\")\n\n``` \nNote \nBy default, if a user passes in a version or timestamp exceeding the last commit on a table, the error `timestampGreaterThanLatestCommit` is thrown. In Databricks Runtime 11.3 LTS and above, change data feed can handle the out of range version case if the user sets the following configuration to `true`: \n```\nset spark.databricks.delta.changeDataFeed.timestampOutOfRange.enabled = true;\n\n``` \nIf you provide a start version greater than the last commit on a table or a start timestamp newer than the last commit on a table, then when the preceding configuration is enabled, an empty read result is returned. \nIf you provide an end version greater than the last commit on a table or an end timestamp newer than the last commit on a table, then when the preceding configuration is enabled in batch read mode, all changes between the start version and the last commit are be returned.\n\n", "chunk_id": "7b3412207940a299a01ad3083ea9013d", "url": "https://docs.databricks.com/delta/delta-change-data-feed.html"} +{"chunked_text": "# What is Delta Lake?\n### Use Delta Lake change data feed on Databricks\n#### What is the schema for the change data feed?\n\nWhen you read from the change data feed for a table, the schema for the latest table version is used. \nNote \nMost schema change and evolution operations are fully supported. Table with column mapping enabled do not support all use cases and demonstrate different behavior. See [Change data feed limitations for tables with column mapping enabled](https://docs.databricks.com/delta/delta-change-data-feed.html#column-mapping-limitations). \nIn addition to the data columns from the schema of the Delta table, change data feed contains metadata columns that identify the type of change event: \n| Column name | Type | Values |\n| --- | --- | --- |\n| `_change_type` | String | `insert`, `update_preimage` , `update_postimage`, `delete` [(1)](https://docs.databricks.com/delta/delta-change-data-feed.html#1) |\n| `_commit_version` | Long | The Delta log or table version containing the change. |\n| `_commit_timestamp` | Timestamp | The timestamp associated when the commit was created. | \n**(1)** `preimage` is the value before the update, `postimage` is the value after the update. \nNote \nYou cannot enable change data feed on a table if the schema contains columns with the same names as these added columns. Rename columns in the table to resolve this conflict before trying to enable change data feed.\n\n", "chunk_id": "1004f36bdeca5bb47222ce08a5917c4d", "url": "https://docs.databricks.com/delta/delta-change-data-feed.html"} +{"chunked_text": "# What is Delta Lake?\n### Use Delta Lake change data feed on Databricks\n#### Enable change data feed\n\nYou can only read the change data feed for enabled tables. You must explicitly enable the change data feed option using one of the following methods: \n* **New table**: Set the table property `delta.enableChangeDataFeed = true` in the `CREATE TABLE` command. \n```\nCREATE TABLE student (id INT, name STRING, age INT) TBLPROPERTIES (delta.enableChangeDataFeed = true)\n\n```\n* **Existing table**: Set the table property `delta.enableChangeDataFeed = true` in the `ALTER TABLE` command. \n```\nALTER TABLE myDeltaTable SET TBLPROPERTIES (delta.enableChangeDataFeed = true)\n\n```\n* **All new tables**: \n```\nset spark.databricks.delta.properties.defaults.enableChangeDataFeed = true;\n\n``` \nImportant \nOnly changes made after you enable the change data feed are recorded. Past changes to a table are not captured.\n\n### Use Delta Lake change data feed on Databricks\n#### Change data storage\n\nEnabling change data feed causes a small increase in storage costs for a table. Change data records are generated as the query runs, and are generally much smaller than the total size of rewritten files. \nDatabricks records change data for `UPDATE`, `DELETE`, and `MERGE` operations in the `_change_data` folder under the table directory. Some operations, such as insert-only operations and full-partition deletions, do not generate data in the `_change_data` directory because Databricks can efficiently compute the change data feed directly from the transaction log. \nAll reads against data files in the `_change_data` folder should go through supported Delta Lake APIs. \nThe files in the `_change_data` folder follow the retention policy of the table. Change data feed data is deleted when the `VACUUM` command runs.\n\n", "chunk_id": "9357c7962626f23a5db71e465295e3fe", "url": "https://docs.databricks.com/delta/delta-change-data-feed.html"} +{"chunked_text": "# What is Delta Lake?\n### Use Delta Lake change data feed on Databricks\n#### Can I use change data feed to replay the entire history of a table?\n\nChange data feed is not intended to serve as a permanent record of all changes to a table. Change data feed only records changes that occur after it\u2019s enabled. \nChange data feed and Delta Lake allow you to always reconstruct a full snapshot of a source table, meaning you can start a new streaming read against a table with change data feed enabled and capture the current version of that table and all changes that occur after. \nYou must treat records in the change data feed as transient and only accessible for a specified retention window. The Delta transaction log removes table versions and their corresponding change data feed versions at regular intervals. When a version is removed from the transaction log, you can no longer read the change data feed for that version. \nIf your use case requires maintaining a permanent history of all changes to a table, you should use incremental logic to write records from the change data feed to a new table. The following code example demonstrates using `trigger.AvailableNow`, which leverages the incremental processing of Structured Streaming but processes available data as a batch workload. You can schedule this workload asynchronously with your main processing pipelines to create a backup of change data feed for auditing purposes or full replayability. \n```\n(spark.readStream.format(\"delta\")\n.option(\"readChangeFeed\", \"true\")\n.table(\"source_table\")\n.writeStream\n.option(\"checkpointLocation\", )\n.trigger(availableNow=True)\n.toTable(\"target_table\")\n)\n\n``` \n```\nspark.readStream.format(\"delta\")\n.option(\"readChangeFeed\", \"true\")\n.table(\"source_table\")\n.writeStream\n.option(\"checkpointLocation\", )\n.trigger(Trigger.AvailableNow)\n.toTable(\"target_table\")\n\n```\n\n", "chunk_id": "fefbd10c55273e3b47f40542a35c5ca0", "url": "https://docs.databricks.com/delta/delta-change-data-feed.html"} +{"chunked_text": "# What is Delta Lake?\n### Use Delta Lake change data feed on Databricks\n#### Change data feed limitations for tables with column mapping enabled\n\nWith column mapping enabled on a Delta table, you can drop or rename columns in the table without rewriting data files for existing data. With column mapping enabled, change data feed has limitations after performing non-additive schema changes such as renaming or dropping a column, changing data type, or nullability changes. \nImportant \n* You cannot read change data feed for a transaction or range in which a non-additive schema change occurs using batch semantics.\n* In Databricks Runtime 12.2 LTS and below, tables with column mapping enabled that have experienced non-additive schema changes do not support streaming reads on change data feed. See [Streaming with column mapping and schema changes](https://docs.databricks.com/delta/delta-column-mapping.html#schema-tracking).\n* In Databricks Runtime 11.3 LTS and below, you cannot read change data feed for tables with column mapping enabled that have experienced column renaming or dropping. \nIn Databricks Runtime 12.2 LTS and above, you can perform batch reads on change data feed for tables with column mapping enabled that have experienced non-additive schema changes. Instead of using the schema of the latest version of the table, read operations use the schema of the end version of the table specified in the query. Queries still fail if the version range specified spans a non-additive schema change.\n\n", "chunk_id": "03fc69bbc0c43eef0b6122ded481cf52", "url": "https://docs.databricks.com/delta/delta-change-data-feed.html"} +{"chunked_text": "# Databricks documentation archive\n## Databricks CLI (legacy)\n#### Cluster Policies CLI (legacy)\n\nImportant \nThis documentation has been retired and might not be updated. \nThis information applies to legacy Databricks CLI versions 0.18 and below. Databricks recommends that you use newer Databricks CLI version 0.205 or above instead. See [What is the Databricks CLI?](https://docs.databricks.com/dev-tools/cli/index.html). To find your version of the Databricks CLI, run `databricks -v`. \nTo migrate from Databricks CLI version 0.18 or below to Databricks CLI version 0.205 or above, see [Databricks CLI migration](https://docs.databricks.com/dev-tools/cli/migrate.html). \nOnly workspace admin users can create, edit, and delete policies. Workspace admin users also have access to all policies. \nFor requirements and limitations on cluster policies, see [Create and manage compute policies](https://docs.databricks.com/admin/clusters/policies.html). \nYou run Databricks Cluster Policies CLI subcommands by appending them to `databricks cluster-policies`. These subcommands call the [Cluster Policies API](https://docs.databricks.com/api/workspace/clusterpolicies). \n```\ndatabricks cluster-policies --help\n\n``` \n```\nUsage: databricks cluster-policies [OPTIONS] COMMAND [ARGS]...\n\nUtility to interact with Databricks cluster policies.\n\nOptions:\n-v, --version [VERSION]\n--debug Debug mode. Shows full stack trace on error.\n--profile TEXT CLI connection profile to use. The default profile is\n\"DEFAULT\".\n\n-h, --help Show this message and exit.\n\nCommands:\ncreate Creates a Databricks cluster policy.\ndelete Removes a Databricks cluster policy given its ID.\nedit Edits a Databricks cluster policy.\nget Retrieves metadata about a Databricks cluster policy.\nlist Lists Databricks cluster policies.\n\n```\n\n", "chunk_id": "b1eb5579b82f65084e5709d22150b073", "url": "https://docs.databricks.com/archive/dev-tools/cli/cluster-policies-cli.html"} +{"chunked_text": "# Databricks documentation archive\n## Databricks CLI (legacy)\n#### Cluster Policies CLI (legacy)\n##### Create a cluster policy\n\nTo display usage documentation, run `databricks cluster-policies create --help`. \n```\ndatabricks cluster-policies create --json-file create-cluster-policy.json\n\n``` \n`create-cluster-policy.json`: \n```\n{\n\"name\": \"Example Policy\",\n\"definition\": \"{\\\"spark_version\\\":{\\\"type\\\":\\\"fixed\\\",\\\"value\\\":\\\"next-major-version-scala2.12\\\",\\\"hidden\\\":true}}\"\n}\n\n``` \n```\n{\n\"policy_id\": \"1A234567B890123C\"\n}\n\n```\n\n#### Cluster Policies CLI (legacy)\n##### Delete a cluster policy\n\nTo view help, run `databricks cluster-policies delete --help`. \n```\ndatabricks cluster-policies delete --policy-id 1A234567B890123C\n\n``` \nOn success, this command displays nothing.\n\n#### Cluster Policies CLI (legacy)\n##### Change a cluster policy\n\nTo display usage documentation, run `databricks cluster-policies edit --help`. \n```\ndatabricks cluster-policies edit --json-file edit-cluster-policy.json\n\n``` \n`edit-cluster-policy.json`: \n```\n{\n\"policy_id\": \"1A234567B890123C\",\n\"name\": \"Example Policy\",\n\"definition\": \"{\\\"spark_version\\\":{\\\"type\\\":\\\"fixed\\\",\\\"value\\\":\\\"next-major-version-scala2.12\\\",\\\"hidden\\\":false}}\",\n\"created_at_timestamp\": 1619477108000\n}\n\n``` \nOn success, this command displays nothing.\n\n", "chunk_id": "fa44d85aa9232669d5ae8d9c5c46cb34", "url": "https://docs.databricks.com/archive/dev-tools/cli/cluster-policies-cli.html"} +{"chunked_text": "# Databricks documentation archive\n## Databricks CLI (legacy)\n#### Cluster Policies CLI (legacy)\n##### List information about a cluster policy\n\nTo display usage documentation, run `databricks cluster-policies get --help`. \n```\ndatabricks cluster-policies get --policy-id A123456BCD789012\n\n``` \n```\n{\n\"policy_id\": \"A123456BCD789012\",\n\"name\": \"Cluster Policy Demo\",\n\"definition\": \"{\\n \\\"spark_env_vars.PYSPARK_PYTHON\\\": {\\n \\\"type\\\": \\\"fixed\\\",\\n \\\"value\\\": \\\"/databricks/python3/bin/python27\\\"\\n }\\n}\",\n\"created_at_timestamp\": 1615504519000\n}\n\n```\n\n#### Cluster Policies CLI (legacy)\n##### List information about available cluster policies\n\nTo display usage documentation, run `databricks cluster-policies list --help`. \n```\ndatabricks cluster-policies list --output JSON\n\n``` \n```\n{\n\"policies\": [\n{\n\"policy_id\": \"A123456BCD789012\",\n\"name\": \"Cluster Policy Demo\",\n\"definition\": \"{\\n \\\"spark_env_vars.PYSPARK_PYTHON\\\": {\\n \\\"type\\\": \\\"fixed\\\",\\n \\\"value\\\": \\\"/databricks/python3/bin/python27\\\"\\n }\\n}\",\n\"created_at_timestamp\": 1615504519000\n},\n...\n],\n\"total_count\": 16\n}\n\n```\n\n", "chunk_id": "c7ff93b6fc104f2f2c09a15e47d33a53", "url": "https://docs.databricks.com/archive/dev-tools/cli/cluster-policies-cli.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n### Error classes in Databricks\n##### MISSING\\_ATTRIBUTES error class\n\nSQLSTATE: none assigned \nResolved attribute(s) `` missing from `` in operator ``.\n\n##### MISSING\\_ATTRIBUTES error class\n###### RESOLVED\\_ATTRIBUTE\\_APPEAR\\_IN\\_OPERATION\n\nAttribute(s) with the same name appear in the operation: ``. \nPlease check if the right attribute(s) are used.\n\n", "chunk_id": "a6b82a168e6a73f45f27723689ddc9cf", "url": "https://docs.databricks.com/error-messages/missing-attributes-error-class.html"} +{"chunked_text": "# Ingest data into a Databricks lakehouse\n## What is Auto Loader?\n#### Auto Loader FAQ\n\nCommonly asked questions about Databricks Auto Loader.\n\n#### Auto Loader FAQ\n##### Does Auto Loader process the file again when the file gets appended or overwritten?\n\nFiles are processed exactly once unless `cloudFiles.allowOverwrites` is enabled. When a file is appended to or overwritten, Databricks cannot guarantee which version of the file will be processed. You should also use caution when enabling `cloudFiles.allowOverwrites` in file notification mode, where Auto Loader might identify new files through both file notifications and directory listing. Due to the discrepancy between file notification event time and file modification time, Auto Loader might obtain two different timestamps and therefore ingest the same file twice, even when the file is only written once. \nIn general, Databricks recommends you use Auto Loader to ingest only immutable files and avoid setting `cloudFiles.allowOverwrites`. If this does not meet your requirements, contact your Databricks account team.\n\n#### Auto Loader FAQ\n##### If my data files do not arrive continuously, but in regular intervals, for example, once a day, should I still use this source and are there any benefits?\n\nIn this case, you can set up a `Trigger.AvailableNow` (available in Databricks Runtime 10.4 LTS and above) Structured Streaming job and schedule to run after the anticipated file arrival time. Auto Loader works well with both infrequent or frequent updates. Even if the eventual updates are very large, Auto Loader scales well to the input size. Auto Loader\u2019s efficient file discovery techniques and schema evolution capabilities make Auto Loader the recommended method for incremental data ingestion.\n\n#### Auto Loader FAQ\n##### What happens if I change the checkpoint location when restarting the stream?\n\nA checkpoint location maintains important identifying information of a stream. Changing the checkpoint location effectively means that you have abandoned the previous stream and started a new stream.\n\n", "chunk_id": "b8e5f9c4fb645f4c65c02f72dea52ce2", "url": "https://docs.databricks.com/ingestion/auto-loader/faq.html"} +{"chunked_text": "# Ingest data into a Databricks lakehouse\n## What is Auto Loader?\n#### Auto Loader FAQ\n##### Do I need to create event notification services beforehand?\n\nNo. If you choose file notification mode and provide the required permissions, Auto Loader can create file notification services for you. See [What is Auto Loader file notification mode?](https://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html)\n\n#### Auto Loader FAQ\n##### How do I clean up the event notification resources created by Auto Loader?\n\nYou can use the [cloud resource manager](https://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html#cloud-resource-management) to list and tear down resources.\nYou can also delete these resources manually using the cloud provider\u2019s UI or APIs.\n\n#### Auto Loader FAQ\n##### Can I run multiple streaming queries from different input directories on the same bucket/container?\n\nYes, as long as they are not parent-child directories; for example, `prod-logs/` and `prod-logs/usage/` would not work because `/usage` is a child directory of `/prod-logs`.\n\n#### Auto Loader FAQ\n##### Can I use this feature when there are existing file notifications on my bucket or container?\n\nYes, as long as your input directory does not conflict with the existing notification prefix (for example, the above parent-child directories).\n\n", "chunk_id": "192c08c186875ba8faecaee5f9d05f99", "url": "https://docs.databricks.com/ingestion/auto-loader/faq.html"} +{"chunked_text": "# Ingest data into a Databricks lakehouse\n## What is Auto Loader?\n#### Auto Loader FAQ\n##### How does Auto Loader infer schema?\n\nWhen the DataFrame is first defined, Auto Loader lists your source directory and chooses the most recent (by file modification time) 50 GB of data or 1000 files, and uses those to infer your data schema. \nAuto Loader also infers partition columns by examining the source directory structure and looks for file paths that contain the `/key=value/` structure. If the source directory has an inconsistent structure, for example: \n```\nbase/path/partition=1/date=2020-12-31/file1.json\n// inconsistent because date and partition directories are in different orders\nbase/path/date=2020-12-31/partition=2/file2.json\n// inconsistent because the date directory is missing\nbase/path/partition=3/file3.json\n\n``` \nAuto Loader infers the partition columns as empty. Use `cloudFiles.partitionColumns` to explicitly parse columns from the directory structure.\n\n#### Auto Loader FAQ\n##### How does Auto Loader behave when the source folder is empty?\n\nIf the source directory is empty, Auto Loader requires you to provide a schema as there is no data to perform inference.\n\n#### Auto Loader FAQ\n##### When does Autoloader infer schema? Does it evolve automatically after every micro-batch?\n\nThe schema is inferred when the DataFrame is first defined in your code. During each micro-batch, schema changes are evaluated on the fly; therefore, you don\u2019t need to worry about performance hits. When the stream restarts, it picks up the evolved schema from the schema location and starts executing without any overhead from inference.\n\n#### Auto Loader FAQ\n##### What\u2019s the performance impact on ingesting the data when using Auto Loader schema inference?\n\nYou should expect schema inference to take a couple of minutes for very large source directories during initial schema inference. You shouldn\u2019t observe significant performance hits otherwise during stream execution. If you run your code in a Databricks notebook, you can see status updates that specify when Auto Loader will be listing your directory for sampling and inferring your data schema.\n\n", "chunk_id": "bf450cb14d94a723768609e288ba6a44", "url": "https://docs.databricks.com/ingestion/auto-loader/faq.html"} +{"chunked_text": "# Ingest data into a Databricks lakehouse\n## What is Auto Loader?\n#### Auto Loader FAQ\n##### Due to a bug, a bad file has changed my schema drastically. What should I do to roll back a schema change?\n\nContact Databricks support for help.\n\n", "chunk_id": "68b5328e6e506aa4e13c34d090c9f995", "url": "https://docs.databricks.com/ingestion/auto-loader/faq.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n### Work with features in Workspace Feature Store\n##### Discover features and track feature lineage\n\nWith Databricks Feature Store, you can: \n* Search for feature tables by feature table name, feature, data source, or tag.\n* Control access to feature tables.\n* Identify the data sources used to create a feature table.\n* Identify models that use a particular feature.\n* Add a tag to a feature table.\n* Check feature freshness. \nTo access the Feature Store UI, in the sidebar, select **Machine Learning > Feature Store**. The Feature Store UI lists all of the available feature tables, along with the features in the table and the following metadata: \n* Who created the feature table.\n* Data sources used to compute the feature table.\n* Online stores where the feature table has been published.\n* Scheduled jobs that compute the features in the feature table.\n* The last time a notebook or job wrote to the feature table. \n![Feature store page](https://docs.databricks.com/_images/feature-store-ui.png)\n\n##### Discover features and track feature lineage\n###### Search and browse for feature tables\n\nUse the search box to search for feature tables. You can enter all or part of the name of a feature table, a feature, or a data source used for feature computation. You can also enter all or part of the key or value of a tag. Search text is case-insensitive. \n![Feature search example](https://docs.databricks.com/_images/feature-search-example.png)\n\n##### Discover features and track feature lineage\n###### Control access to feature tables\n\nSee [Control access to feature tables](https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/access-control.html).\n\n", "chunk_id": "d8f1644cfefe0866f6ec0a7334fa4798", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/ui.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n### Work with features in Workspace Feature Store\n##### Discover features and track feature lineage\n###### Track feature lineage and freshness\n\nIn the UI you can track both how a feature was created and where it is used. For example, you can track the raw data sources, notebooks, and jobs that were used to compute the features. You can also track the online stores where the feature is published, the models trained with it, the serving endpoints that access it, and the notebooks and jobs that read it. \nIn the Feature Store UI, click the name of any feature table to display the feature table page. \nOn the feature table page, the **Producers** table provides information about all of the notebooks and jobs that write to this feature table so you can easily confirm the status of scheduled jobs and the freshness of the feature table. \n![producers table](https://docs.databricks.com/_images/producers-table.png) \nThe **Features** table lists all of the features in the table and provides links to the models, endpoints, jobs, and notebooks that use the feature. \n![features table](https://docs.databricks.com/_images/features-table.png) \nTo return to the main Feature Store UI page, click **Feature Store** near the top of the page.\n\n", "chunk_id": "d87f1f275db80b575ad52131aa103f13", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/ui.html"} +{"chunked_text": "# AI and Machine Learning on Databricks\n## What is a feature store?\n### Work with features in Workspace Feature Store\n##### Discover features and track feature lineage\n###### Add a tag to a feature table\n\nTags are key-value pairs that you can create and use to [search for feature tables](https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/ui.html#search-and-browse-for-feature-tables). \n1. On the feature table page, click ![Tag icon](https://docs.databricks.com/_images/tags1.png) if it is not already open. The tags table appears. \n![tag table](https://docs.databricks.com/_images/tags-open.png)\n2. Click in the **Name** and **Value** fields and enter the key and value for your tag.\n3. Click **Add**. \n![add tag](https://docs.databricks.com/_images/tag-add.png) \n### Edit or delete a tag \nTo edit or delete an existing tag, use the icons in the **Actions** column. \n![tag actions](https://docs.databricks.com/_images/tag-edit-or-delete.png)\n\n", "chunk_id": "f6fb3998bd245163a67fc7bb1bfa3554", "url": "https://docs.databricks.com/machine-learning/feature-store/workspace-feature-store/ui.html"} +{"chunked_text": "# Databricks release notes\n## Databricks Runtime release notes versions and compatibility\n#### Databricks Runtime 14.1\n\nThe following release notes provide information about Databricks Runtime 14.1, powered by Apache Spark 3.5.0. \nDatabricks released these images in October 2023.\n\n", "chunk_id": "caad30b96e2bf9003c1e1862f4d98120", "url": "https://docs.databricks.com/release-notes/runtime/14.1.html"} +{"chunked_text": "# Databricks release notes\n## Databricks Runtime release notes versions and compatibility\n#### Databricks Runtime 14.1\n##### New features and improvements\n\n* [array\\_insert() is 1-based for negative indexes](https://docs.databricks.com/release-notes/runtime/14.1.html#array_insert-is-1-based-for-negative-indexes)\n* [Delta v2 checkpoints enabled by default with liquid clustering](https://docs.databricks.com/release-notes/runtime/14.1.html#delta-v2-checkpoints-enabled-by-default-with-liquid-clustering)\n* [Drop Delta table feature in Public Preview](https://docs.databricks.com/release-notes/runtime/14.1.html#drop-delta-table-feature-in-public-preview)\n* [Delta Sharing: Recipients can perform batch queries on shared tables with deletion vectors (Public Preview)](https://docs.databricks.com/release-notes/runtime/14.1.html#delta-sharing-recipients-can-perform-batch-queries-on-shared-tables-with-deletion-vectors-public-preview)\n* [Delta Sharing: Recipients can perform batch queries on shared tables with column mapping (Public Preview)](https://docs.databricks.com/release-notes/runtime/14.1.html#delta-sharing-recipients-can-perform-batch-queries-on-shared-tables-with-column-mapping-public-preview)\n* [Stream from Unity Catalog views in Public Preview](https://docs.databricks.com/release-notes/runtime/14.1.html#stream-from-unity-catalog-views-in-public-preview)\n* [Apache Pulsar connector in Public Preview](https://docs.databricks.com/release-notes/runtime/14.1.html#apache-pulsar-connector-in-public-preview)\n* [Upgraded Snowflake driver](https://docs.databricks.com/release-notes/runtime/14.1.html#upgraded-snowflake-driver)\n* [SQL Session variables](https://docs.databricks.com/release-notes/runtime/14.1.html#sql-session-variables)\n* [Named parameter invocation for SQL and Python UDF.](https://docs.databricks.com/release-notes/runtime/14.1.html#named-parameter-invocation-for-sql-and-python-udf)\n* [Table arguments to functions support partitioning and ordering.](https://docs.databricks.com/release-notes/runtime/14.1.html#table-arguments-to-functions-support-partitioning-and-ordering)\n* [New and enhanced builtin SQL functions](https://docs.databricks.com/release-notes/runtime/14.1.html#new-and-enhanced-builtin-sql-functions)\n* [Improved handling of correlated subqueries](https://docs.databricks.com/release-notes/runtime/14.1.html#improved-handling-of-correlated-subqueries) \n### [array\\_insert() is 1-based for negative indexes](https://docs.databricks.com/release-notes/runtime/14.1.html#id1) \nThe `array_insert` function is 1-based for both positive and negative indexes. It now inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`. \n### [Delta v2 checkpoints enabled by default with liquid clustering](https://docs.databricks.com/release-notes/runtime/14.1.html#id2) \nNewly created Delta tables with liquid clustering use v2 checkpoints by default. See [Compatibility for tables with liquid clustering](https://docs.databricks.com/delta/clustering.html#compatibility). \n### [Drop Delta table feature in Public Preview](https://docs.databricks.com/release-notes/runtime/14.1.html#id3) \nYou can now drop some table features for Delta tables. Current support includes dropping `deletionVectors` and `v2Checkpoint`. See [Drop Delta table features](https://docs.databricks.com/delta/drop-feature.html). \n### [Delta Sharing: Recipients can perform batch queries on shared tables with deletion vectors (Public Preview)](https://docs.databricks.com/release-notes/runtime/14.1.html#id4) \nDelta Sharing recipients can now perform batch queries on shared tables that use deletion vectors. See [Add tables with deletion vectors or column mapping to a share](https://docs.databricks.com/data-sharing/create-share.html#deletion-vectors), [Read tables with deletion vectors or column mapping enabled](https://docs.databricks.com/data-sharing/read-data-databricks.html#deletion-vectors), and [Read tables with deletion vectors or column mapping enabled](https://docs.databricks.com/data-sharing/read-data-databricks.html#deletion-vectors). \n### [Delta Sharing: Recipients can perform batch queries on shared tables with column mapping (Public Preview)](https://docs.databricks.com/release-notes/runtime/14.1.html#id5) \nDelta Sharing recipients can now perform batch queries on shared tables that use column mapping. See [Add tables with deletion vectors or column mapping to a share](https://docs.databricks.com/data-sharing/create-share.html#deletion-vectors), [Read tables with deletion vectors or column mapping enabled](https://docs.databricks.com/data-sharing/read-data-databricks.html#deletion-vectors), and [Read tables with deletion vectors or column mapping enabled](https://docs.databricks.com/data-sharing/read-data-databricks.html#deletion-vectors). \n### [Stream from Unity Catalog views in Public Preview](https://docs.databricks.com/release-notes/runtime/14.1.html#id6) \nYou can now use Structured Streaming to perform streaming reads from views registered with Unity Catalog. Databricks only supports streaming reads from views defined against Delta tables. See [Stream from Unity Catalog views](https://docs.databricks.com/structured-streaming/views.html). \n### [Apache Pulsar connector in Public Preview](https://docs.databricks.com/release-notes/runtime/14.1.html#id7) \nYou can now use Structured Streaming to stream data from Apache Pulsar on Databricks. See [Stream from Apache Pulsar](https://docs.databricks.com/connect/streaming/pulsar.html). \n### [Upgraded Snowflake driver](https://docs.databricks.com/release-notes/runtime/14.1.html#id8) \nThe Snowflake JDBC driver now uses version 3.13.33. \n### [SQL Session variables](https://docs.databricks.com/release-notes/runtime/14.1.html#id9) \nThis release introduces the ability to declare temporary variables in a session which can be set and then referred to from within queries. See [Variables](https://docs.databricks.com/sql/language-manual/sql-ref-variables.html). \n### [Named parameter invocation for SQL and Python UDF.](https://docs.databricks.com/release-notes/runtime/14.1.html#id10) \nYou can now use [Named parameter invocation](https://docs.databricks.com/sql/language-manual/sql-ref-function-invocation.html#named-parameter-invocation) on [SQL and Python UDF](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html). \n### [Table arguments to functions support partitioning and ordering.](https://docs.databricks.com/release-notes/runtime/14.1.html#id11) \nYou can now use `PARTITION BY` and `ORDER BY` clauses to control how table arguments are passed to a function during [function invocation](https://docs.databricks.com/sql/language-manual/sql-ref-function-invocation.html). \n### [New and enhanced builtin SQL functions](https://docs.databricks.com/release-notes/runtime/14.1.html#id12) \nThe following builtin functions have been added: \n* [from\\_xml](https://docs.databricks.com/sql/language-manual/functions/from_xml.html): Parses an XML `STRING` into a `STRUCT`.\n* [schema\\_of\\_xml](https://docs.databricks.com/sql/language-manual/functions/schema_of_xml.html): Derives a schema from an XML `STRING`.\n* [session\\_user](https://docs.databricks.com/sql/language-manual/functions/session_user.html): Returns the logged-in user.\n* [try\\_reflect](https://docs.databricks.com/sql/language-manual/functions/try_reflect.html): Returns `NULL` instead of the exception if a Java method fails. \nThe following builtin functions have been enhanced: \n* [mode](https://docs.databricks.com/sql/language-manual/functions/mode.html): Support for an optional parameter forcing a deterministic result.\n* [to\\_char](https://docs.databricks.com/sql/language-manual/functions/to_char.html): New support for `DATE`, `TIMESTAMP`, and `BINARY`.\n* [to\\_varchar](https://docs.databricks.com/sql/language-manual/functions/to_varchar.html): New support for `DATE`, `TIMESTAMP`, and `BINARY`. \n### [Improved handling of correlated subqueries](https://docs.databricks.com/release-notes/runtime/14.1.html#id13) \nThe ability to process correlation in subqueries has been extended: \n* Handle limit and order by in correlated scalar (lateral) subqueries.\n* Support window functions in correlated scalar subqueries.\n* Support correlated references in join predicates for scalar and lateral subqueries\n\n", "chunk_id": "718e952d86582e703aadaa155a786e8d", "url": "https://docs.databricks.com/release-notes/runtime/14.1.html"} +{"chunked_text": "# Databricks release notes\n## Databricks Runtime release notes versions and compatibility\n#### Databricks Runtime 14.1\n##### Behavior change\n\n### Strict type checking in Photon Parquet reader \nPhoton fails when trying to read a decimal value from a Parquet column that is not a decimal type. Photon also fails when reading a fixed-length byte array from Parquet as a string.\n\n", "chunk_id": "d6ed50cb43e138639e6ecb2e633bde59", "url": "https://docs.databricks.com/release-notes/runtime/14.1.html"} +{"chunked_text": "# Databricks release notes\n## Databricks Runtime release notes versions and compatibility\n#### Databricks Runtime 14.1\n##### Library upgrades\n\n* Upgraded Python libraries: \n+ filelock from 3.12.2 to 3.12.3\n+ s3transfer from 0.6.1 to 0.6.2\n* Upgraded Java libraries: \n+ com.uber.h3 from 3.7.0 to 3.7.3\n+ io.airlift.aircompressor from 0.24 to 0.25\n+ io.delta.delta-sharing-spark\\_2.12 from 0.7.1 to 0.7.5\n+ io.netty.netty-all from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-buffer from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-codec from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-codec-http from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-codec-http2 from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-codec-socks from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-common from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-handler from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-handler-proxy from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-resolver from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-transport from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-transport-classes-epoll from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-transport-classes-kqueue from 4.1.93.Final to 4.1.96.Final\n+ io.netty.netty-transport-native-epoll from 4.1.93.Final-linux-x86\\_64 to 4.1.96.Final-linux-x86\\_64\n+ io.netty.netty-transport-native-kqueue from 4.1.93.Final-osx-x86\\_64 to 4.1.96.Final-osx-x86\\_64\n+ io.netty.netty-transport-native-unix-common from 4.1.93.Final to 4.1.96.Final\n+ net.snowflake.snowflake-jdbc from 3.13.29 to 3.13.33\n+ org.apache.orc.orc-core from 1.9.0-shaded-protobuf to 1.9.1-shaded-protobuf\n+ org.apache.orc.orc-mapreduce from 1.9.0-shaded-protobuf to 1.9.1-shaded-protobuf\n+ org.apache.orc.orc-shims from 1.9.0 to 1.9.1\n\n", "chunk_id": "bdb5cd71e14839993819b75cc0baae1c", "url": "https://docs.databricks.com/release-notes/runtime/14.1.html"} +{"chunked_text": "# Databricks release notes\n## Databricks Runtime release notes versions and compatibility\n#### Databricks Runtime 14.1\n##### Apache Spark\n\nDatabricks Runtime 14.1 includes Apache Spark 3.5.0. This release includes all Spark fixes and improvements\nincluded in [Databricks Runtime 14.0 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/14.0.html), as well as the following additional bug fixes and improvements made to Spark: \n* [[SPARK-45088]](https://issues.apache.org/jira/browse/SPARK-45088) [DBRRM-557] Revert \u201c[SC-142785][PYTHON][CONNECT] Make getitem work with duplicated column\u201d\n* [[SPARK-43453]](https://issues.apache.org/jira/browse/SPARK-43453) [DBRRM-557]Revert \u201c[SC-143135][PS] Ignore the `names` of `MultiIndex` when `axis=1` for `concat`\u201d\n* [[SPARK-45225]](https://issues.apache.org/jira/browse/SPARK-45225) [SC-143207][SQL] XML: XSD file URL support\n* [[SPARK-45156]](https://issues.apache.org/jira/browse/SPARK-45156) [SC-142782][SQL] Wrap `inputName` by backticks in the `NON_FOLDABLE_INPUT` error class\n* [[SPARK-44910]](https://issues.apache.org/jira/browse/SPARK-44910) [SC-143082][SQL] Encoders.bean does not support superclasses with generic type arguments\n* [[SPARK-43453]](https://issues.apache.org/jira/browse/SPARK-43453) [SC-143135][PS] Ignore the `names` of `MultiIndex` when `axis=1` for `concat`\n* [[SPARK-44463]](https://issues.apache.org/jira/browse/SPARK-44463) [SS][CONNECT] Improve error handling for Connect steaming Python worker\n* [[SPARK-44960]](https://issues.apache.org/jira/browse/SPARK-44960) [SC-141023][UI] Unescape and consist error summary across UI pages\n* [[SPARK-44788]](https://issues.apache.org/jira/browse/SPARK-44788) [SC-142980][CONNECT][PYTHON][SQL] Add from\\_xml and schema\\_of\\_xml to pyspark, spark connect and sql function\n* [[SPARK-44614]](https://issues.apache.org/jira/browse/SPARK-44614) [SC-138460][PYTHON][CONNECT][3.5] Add missing packages in setup.py\n* [[SPARK-45151]](https://issues.apache.org/jira/browse/SPARK-45151) [SC-142861][CORE][UI] Task Level Thread Dump Support\n* [[SPARK-45056]](https://issues.apache.org/jira/browse/SPARK-45056) [SC-142779][PYTHON][SS][CONNECT] Termination tests for streamingQueryListener and foreachBatch\n* [[SPARK-45084]](https://issues.apache.org/jira/browse/SPARK-45084) [SC-142828][SS] StateOperatorProgress to use accurate effective shuffle partition number\n* [[SPARK-44872]](https://issues.apache.org/jira/browse/SPARK-44872) [SC-142405][CONNECT] Server testing infra and ReattachableExecuteSuite\n* [[SPARK-45197]](https://issues.apache.org/jira/browse/SPARK-45197) [SC-142984][CORE] Make `StandaloneRestServer` add `JavaModuleOptions` to drivers\n* [[SPARK-44404]](https://issues.apache.org/jira/browse/SPARK-44404) [SC-139601][SQL] Assign names to the error class *LEGACY*ERROR\\_TEMP\\_[1009,1010,1013,1015,1016,1278]\n* [[SPARK-44647]](https://issues.apache.org/jira/browse/SPARK-44647) [SC-142297][SQL] Support SPJ where join keys are less than cluster keys\n* [[SPARK-45088]](https://issues.apache.org/jira/browse/SPARK-45088) [SC-142785][PYTHON][CONNECT] Make `getitem` work with duplicated columns\n* [[SPARK-45128]](https://issues.apache.org/jira/browse/SPARK-45128) [SC-142851][SQL] Support `CalendarIntervalType` in Arrow\n* [[SPARK-45130]](https://issues.apache.org/jira/browse/SPARK-45130) [SC-142976][CONNECT][ML][PYTHON] Avoid Spark connect ML model to change input pandas dataframe\n* [[SPARK-45034]](https://issues.apache.org/jira/browse/SPARK-45034) [SC-142959][SQL] Support deterministic mode function\n* [[SPARK-45173]](https://issues.apache.org/jira/browse/SPARK-45173) [SC-142931][UI] Remove some unnecessary sourceMapping files in UI\n* [[SPARK-45162]](https://issues.apache.org/jira/browse/SPARK-45162) [SC-142781][SQL] Support maps and array parameters constructed via `call_function`\n* [[SPARK-45143]](https://issues.apache.org/jira/browse/SPARK-45143) [SC-142840][PYTHON][CONNECT] Make PySpark compatible with PyArrow 13.0.0\n* [[SPARK-45174]](https://issues.apache.org/jira/browse/SPARK-45174) [SC-142837][CORE] Support `spark.deploy.maxDrivers`\n* [[SPARK-45167]](https://issues.apache.org/jira/browse/SPARK-45167) [SC-142956][CONNECT][PYTHON] Python client must call `release_all`\n* [[SPARK-36191]](https://issues.apache.org/jira/browse/SPARK-36191) [SC-142777][SQL] Handle limit and order by in correlated scalar (lateral) subqueries\n* [[SPARK-45159]](https://issues.apache.org/jira/browse/SPARK-45159) [SC-142829][PYTHON] Handle named arguments only when necessary\n* [[SPARK-45133]](https://issues.apache.org/jira/browse/SPARK-45133) [SC-142512][CONNECT] Make Spark Connect queries be FINISHED when last result task is finished\n* [[SPARK-44801]](https://issues.apache.org/jira/browse/SPARK-44801) [SC-140802][SQL][UI] Capture analyzing failed queries in Listener and UI\n* [[SPARK-45139]](https://issues.apache.org/jira/browse/SPARK-45139) [SC-142527][SQL] Add DatabricksDialect to handle SQL type conversion\n* [[SPARK-45157]](https://issues.apache.org/jira/browse/SPARK-45157) [SC-142546][SQL] Avoid repeated `if` checks in `[On|Off|HeapColumnVector`\n* [[SPARK-45077]](https://issues.apache.org/jira/browse/SPARK-45077) Revert \u201c[SC-142069][UI] Upgrade dagre-d3.js from 04.3 to 0.6.4\u201d\n* [[SPARK-45145]](https://issues.apache.org/jira/browse/SPARK-45145) [SC-142521][EXAMPLE] Add JavaSparkSQLCli example\n* [[SPARK-43295]](https://issues.apache.org/jira/browse/SPARK-43295) Revert \u201c[SC-142254][PS] Support string type columns for `DataFrameGroupBy.sum`\u201d\n* [[SPARK-44915]](https://issues.apache.org/jira/browse/SPARK-44915) [SC-142383][CORE] Validate checksum of remounted PVC\u2019s shuffle data before recovery\n* [[SPARK-45147]](https://issues.apache.org/jira/browse/SPARK-45147) [SC-142524][CORE] Remove `System.setSecurityManager` usage\n* [[SPARK-45104]](https://issues.apache.org/jira/browse/SPARK-45104) [SC-142206][UI] Upgrade `graphlib-dot.min.js` to 1.0.2\n* [[SPARK-44238]](https://issues.apache.org/jira/browse/SPARK-44238) [SC-141606][CORE][SQL] Introduce a new `readFrom` method with byte array input for `BloomFilter`\n* [[SPARK-45060]](https://issues.apache.org/jira/browse/SPARK-45060) [SC-141742][SQL] Fix an internal error from `to_char()`on `NULL` format\n* [[SPARK-43252]](https://issues.apache.org/jira/browse/SPARK-43252) [SC-142381][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2016` with an internal error\n* [[SPARK-45069]](https://issues.apache.org/jira/browse/SPARK-45069) [SC-142279][SQL] SQL variable should always be resolved after outer reference\n* [[SPARK-44911]](https://issues.apache.org/jira/browse/SPARK-44911) [SC-142388][SQL] Create hive table with invalid column should return error class\n* [[SPARK-42754]](https://issues.apache.org/jira/browse/SPARK-42754) [SC-125567][SQL][UI] Fix backward compatibility issue in nested SQL execution\n* [[SPARK-45121]](https://issues.apache.org/jira/browse/SPARK-45121) [SC-142375][CONNECT][PS] Support `Series.empty` for Spark Connect.\n* [[SPARK-44805]](https://issues.apache.org/jira/browse/SPARK-44805) [SC-142203][SQL] getBytes/getShorts/getInts/etc. should work in a column vector that has a dictionary\n* [[SPARK-45027]](https://issues.apache.org/jira/browse/SPARK-45027) [SC-142248][PYTHON] Hide internal functions/variables in `pyspark.sql.functions` from auto-completion\n* [[SPARK-45073]](https://issues.apache.org/jira/browse/SPARK-45073) [SC-141791][PS][CONNECT] Replace `LastNotNull` with `Last(ignoreNulls=True)`\n* [[SPARK-44901]](https://issues.apache.org/jira/browse/SPARK-44901) [SC-141676][SQL] Manual backport: Add API in Python UDTF \u2018analyze\u2019 method to return partitioning/ordering expressions\n* [[SPARK-45076]](https://issues.apache.org/jira/browse/SPARK-45076) [SC-141795][PS] Switch to built-in `repeat` function\n* [[SPARK-44162]](https://issues.apache.org/jira/browse/SPARK-44162) [SC-141605][CORE] Support G1GC in spark metrics\n* [[SPARK-45053]](https://issues.apache.org/jira/browse/SPARK-45053) [SC-141733][PYTHON][MINOR] Log improvement in python version mismatch\n* [[SPARK-44866]](https://issues.apache.org/jira/browse/SPARK-44866) [SC-142221][SQL] Add `SnowflakeDialect` to handle BOOLEAN type correctly\n* [[SPARK-45064]](https://issues.apache.org/jira/browse/SPARK-45064) [SC-141775][PYTHON][CONNECT] Add the missing `scale` parameter in `ceil/ceiling`\n* [[SPARK-45059]](https://issues.apache.org/jira/browse/SPARK-45059) [SC-141757][CONNECT][PYTHON] Add `try_reflect` functions to Scala and Python\n* [[SPARK-43251]](https://issues.apache.org/jira/browse/SPARK-43251) [SC-142280][SQL] Replace the error class `_LEGACY_ERROR_TEMP_2015` with an internal error\n* [[SPARK-45052]](https://issues.apache.org/jira/browse/SPARK-45052) [SC-141736][SQL][PYTHON][CONNECT] Make function aliases output column name consistent with SQL\n* [[SPARK-44239]](https://issues.apache.org/jira/browse/SPARK-44239) [SC-141502][SQL] Free memory allocated by large vectors when vectors are reset\n* [[SPARK-43295]](https://issues.apache.org/jira/browse/SPARK-43295) [SC-142254][PS] Support string type columns for `DataFrameGroupBy.sum`\n* [[SPARK-45080]](https://issues.apache.org/jira/browse/SPARK-45080) [SC-142062][SS] Explicitly call out support for columnar in DSv2 streaming data sources\n* [[SPARK-45036]](https://issues.apache.org/jira/browse/SPARK-45036) [SC-141768][SQL] SPJ: Simplify the logic to handle partially clustered distribution\n* [[SPARK-45077]](https://issues.apache.org/jira/browse/SPARK-45077) [SC-142069][UI] Upgrade dagre-d3.js from 0.4.3 to 0.6.4\n* [[SPARK-45091]](https://issues.apache.org/jira/browse/SPARK-45091) [SC-142020][PYTHON][CONNECT][SQL] Function `floor/round/bround` accept Column type `scale`\n* [[SPARK-45090]](https://issues.apache.org/jira/browse/SPARK-45090) [SC-142019][PYTHON][CONNECT] `DataFrame.{cube, rollup}` support column ordinals\n* [[SPARK-44743]](https://issues.apache.org/jira/browse/SPARK-44743) [SC-141625][SQL] Add `try_reflect` function\n* [[SPARK-45086]](https://issues.apache.org/jira/browse/SPARK-45086) [SC-142052][UI] Display hexadecimal for thread lock hash code\n* [[SPARK-44952]](https://issues.apache.org/jira/browse/SPARK-44952) [SC-141644][SQL][PYTHON] Support named arguments in aggregate Pandas UDFs\n* [[SPARK-44987]](https://issues.apache.org/jira/browse/SPARK-44987) [SC-141552][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_1100`\n* [[SPARK-45032]](https://issues.apache.org/jira/browse/SPARK-45032) [SC-141730][CONNECT] Fix compilation warnings related to `Top-level wildcard is not allowed and will error under -Xsource:3`\n* [[SPARK-45048]](https://issues.apache.org/jira/browse/SPARK-45048) [SC-141629][CONNECT] Add additional tests for Python client and attachable execution\n* [[SPARK-45072]](https://issues.apache.org/jira/browse/SPARK-45072) [SC-141807][CONNECT] Fix outer scopes for ammonite classes\n* [[SPARK-45033]](https://issues.apache.org/jira/browse/SPARK-45033) [SC-141759][SQL] Support maps by parameterized `sql()`\n* [[SPARK-45066]](https://issues.apache.org/jira/browse/SPARK-45066) [SC-141772][SQL][PYTHON][CONNECT] Make function `repeat` accept column-type `n`\n* [[SPARK-44860]](https://issues.apache.org/jira/browse/SPARK-44860) [SC-141103][SQL] Add SESSION\\_USER function\n* [[SPARK-45074]](https://issues.apache.org/jira/browse/SPARK-45074) [SC-141796][PYTHON][CONNECT] `DataFrame.{sort, sortWithinPartitions}` support column ordinals\n* [[SPARK-45047]](https://issues.apache.org/jira/browse/SPARK-45047) [SC-141774][PYTHON][CONNECT] `DataFrame.groupBy` support ordinals\n* [[SPARK-44863]](https://issues.apache.org/jira/browse/SPARK-44863) [SC-140798][UI] Add a button to download thread dump as a txt in Spark UI\n* [[SPARK-45026]](https://issues.apache.org/jira/browse/SPARK-45026) [SC-141604][CONNECT] `spark.sql` should support datatypes not compatible with arrow\n* [[SPARK-44999]](https://issues.apache.org/jira/browse/SPARK-44999) [SC-141145][CORE] Refactor `ExternalSorter` to reduce checks on `shouldPartition` when calling `getPartition`\n* [[SPARK-42304]](https://issues.apache.org/jira/browse/SPARK-42304) [SC-141501][SQL] Rename `_LEGACY_ERROR_TEMP_2189` to `GET_TABLES_BY_TYPE_UNSUPPORTED_BY_HIVE_VERSION`\n* [[SPARK-43781]](https://issues.apache.org/jira/browse/SPARK-43781) [SC-139450][SQL] Fix IllegalStateException when cogrouping two datasets derived from the same source\n* [[SPARK-45018]](https://issues.apache.org/jira/browse/SPARK-45018) [SC-141516][PYTHON][CONNECT] Add CalendarIntervalType to Python Client\n* [[SPARK-45024]](https://issues.apache.org/jira/browse/SPARK-45024) [SC-141513][PYTHON][CONNECT] Filter out some configurations in Session Creation\n* [[SPARK-45017]](https://issues.apache.org/jira/browse/SPARK-45017) [SC-141508][PYTHON] Add `CalendarIntervalType` to PySpark\n* [[SPARK-44720]](https://issues.apache.org/jira/browse/SPARK-44720) [SC-139375][CONNECT] Make Dataset use Encoder instead of AgnosticEncoder\n* [[SPARK-44982]](https://issues.apache.org/jira/browse/SPARK-44982) [SC-141027][CONNECT] Mark Spark Connect server configurations as static\n* [[SPARK-44839]](https://issues.apache.org/jira/browse/SPARK-44839) [SC-140900][SS][CONNECT] Better Error Logging when user tries to serialize spark session\n* [[SPARK-44865]](https://issues.apache.org/jira/browse/SPARK-44865) [SC-140905][SS] Make StreamingRelationV2 support metadata column\n* [[SPARK-45001]](https://issues.apache.org/jira/browse/SPARK-45001) [SC-141141][PYTHON][CONNECT] Implement DataFrame.foreachPartition\n* [[SPARK-44497]](https://issues.apache.org/jira/browse/SPARK-44497) [SC-141017][WEBUI] Show task partition id in Task table\n* [[SPARK-45006]](https://issues.apache.org/jira/browse/SPARK-45006) [SC-141143][UI] Use the same date format of other UI date elements for the x-axis of timelines\n* [[SPARK-45000]](https://issues.apache.org/jira/browse/SPARK-45000) [SC-141135][PYTHON][CONNECT] Implement DataFrame.foreach\n* [[SPARK-44967]](https://issues.apache.org/jira/browse/SPARK-44967) [SC-141137][SQL][CONNECT] Unit should be considered first before using Boolean for TreeNodeTag\n* [[SPARK-44993]](https://issues.apache.org/jira/browse/SPARK-44993) [SC-141088][CORE] Add `ShuffleChecksumUtils.compareChecksums` by reusing `ShuffleChecksumTestHelp.compareChecksums`\n* [[SPARK-44807]](https://issues.apache.org/jira/browse/SPARK-44807) [SC-140176][CONNECT] Add Dataset.metadataColumn to Scala Client\n* [[SPARK-44965]](https://issues.apache.org/jira/browse/SPARK-44965) [SC-141098][PYTHON] Hide internal functions/variables from `pyspark.sql.functions`\n* [[SPARK-44983]](https://issues.apache.org/jira/browse/SPARK-44983) [SC-141030][SQL] Convert binary to string by `to_char` for the formats: `hex`, `base64`, `utf-8`\n* [[SPARK-44980]](https://issues.apache.org/jira/browse/SPARK-44980) [DBRRM-462][SC-141024][PYTHON][CONNECT] Fix inherited namedtuples to work in createDataFrame\n* [[SPARK-44985]](https://issues.apache.org/jira/browse/SPARK-44985) [SC-141033][CORE] Use toString instead of stacktrace for task reaper threadDump\n* [[SPARK-44984]](https://issues.apache.org/jira/browse/SPARK-44984) [SC-141028][PYTHON][CONNECT] Remove `_get_alias` from DataFrame\n* [[SPARK-44975]](https://issues.apache.org/jira/browse/SPARK-44975) [SC-141013][SQL] Remove BinaryArithmetic useless override resolved\n* [[SPARK-44969]](https://issues.apache.org/jira/browse/SPARK-44969) [SC-140957][SQL] Reuse `ArrayInsert` in `ArrayAppend`\n* [[SPARK-44549]](https://issues.apache.org/jira/browse/SPARK-44549) [SC-140714][SQL] Support window functions in correlated scalar subqueries\n* [[SPARK-44938]](https://issues.apache.org/jira/browse/SPARK-44938) [SC-140811][SQL] Change default value of `spark.sql.maxSinglePartitionBytes` to 128m\n* [[SPARK-44918]](https://issues.apache.org/jira/browse/SPARK-44918) [SC-140816][SQL][PYTHON] Support named arguments in scalar Python/Pandas UDFs\n* [[SPARK-44966]](https://issues.apache.org/jira/browse/SPARK-44966) [SC-140907][CORE][CONNECT] Change the never changed `var` to `val`\n* [[SPARK-41471]](https://issues.apache.org/jira/browse/SPARK-41471) [SC-140804][SQL] Reduce Spark shuffle when only one side of a join is KeyGroupedPartitioning\n* [[SPARK-44214]](https://issues.apache.org/jira/browse/SPARK-44214) [SC-140528][CORE] Support Spark Driver Live Log UI\n* [[SPARK-44861]](https://issues.apache.org/jira/browse/SPARK-44861) [SC-140716][CONNECT] jsonignore SparkListenerConnectOperationStarted.planRequest\n* [[SPARK-44776]](https://issues.apache.org/jira/browse/SPARK-44776) [SC-140519][CONNECT] Add ProducedRowCount to SparkListenerConnectOperationFinished\n* [[SPARK-42017]](https://issues.apache.org/jira/browse/SPARK-42017) [SC-140765][PYTHON][CONNECT] `df['col_name']` should validate the column name\n* [[SPARK-40178]](https://issues.apache.org/jira/browse/SPARK-40178) [SC-140433][SQL][COONECT] Support coalesce hints with ease for PySpark and R\n* [[SPARK-44840]](https://issues.apache.org/jira/browse/SPARK-44840) [SC-140593][SQL] Make `array_insert()` 1-based for negative indexes\n* [[SPARK-44939]](https://issues.apache.org/jira/browse/SPARK-44939) [SC-140778][R] Support Java 21 in SparkR SystemRequirements\n* [[SPARK-44936]](https://issues.apache.org/jira/browse/SPARK-44936) [SC-140770][CORE] Simplify the log when Spark HybridStore hits the memory limit\n* [[SPARK-44908]](https://issues.apache.org/jira/browse/SPARK-44908) [SC-140712][ML][CONNECT] Fix cross validator foldCol param functionality\n* [[SPARK-44816]](https://issues.apache.org/jira/browse/SPARK-44816) [SC-140717][CONNECT] Improve error message when UDF class is not found\n* [[SPARK-44909]](https://issues.apache.org/jira/browse/SPARK-44909) [SC-140710][ML] Skip starting torch distributor log streaming server when it is not available\n* [[SPARK-44920]](https://issues.apache.org/jira/browse/SPARK-44920) [SC-140707][CORE] Use await() instead of awaitUninterruptibly() in TransportClientFactory.createClient()\n* [[SPARK-44905]](https://issues.apache.org/jira/browse/SPARK-44905) [SC-140703][SQL] Stateful lastRegex causes NullPointerException on eval for regexp\\_replace\n* [[SPARK-43987]](https://issues.apache.org/jira/browse/SPARK-43987) [SC-139594][Shuffle] Separate finalizeShuffleMerge Processing to Dedicated Thread Pools\n* [[SPARK-42768]](https://issues.apache.org/jira/browse/SPARK-42768) [SC-140549][SQL] Enable cached plan apply AQE by default\n* [[SPARK-44741]](https://issues.apache.org/jira/browse/SPARK-44741) [SC-139447][CORE] Support regex-based MetricFilter in `StatsdSink`\n* [[SPARK-44751]](https://issues.apache.org/jira/browse/SPARK-44751) [SC-140532][SQL] XML FileFormat Interface implementation\n* [[SPARK-44868]](https://issues.apache.org/jira/browse/SPARK-44868) [SC-140438][SQL] Convert datetime to string by `to_char`/`to_varchar`\n* [[SPARK-44748]](https://issues.apache.org/jira/browse/SPARK-44748) [SC-140504][SQL] Query execution for the PARTITION BY clause in UDTF TABLE arguments\n* [[SPARK-44873]](https://issues.apache.org/jira/browse/SPARK-44873) [SC-140427] Support alter view with nested columns in Hive client\n* [[SPARK-44876]](https://issues.apache.org/jira/browse/SPARK-44876) [SC-140431][PYTHON] Fix Arrow-optimized Python UDF on Spark Connect\n* [[SPARK-44520]](https://issues.apache.org/jira/browse/SPARK-44520) [SC-137845][SQL] Replace the term UNSUPPORTED\\_DATA\\_SOURCE\\_FOR\\_DIRECT\\_QUERY with UNSUPPORTED\\_DATASOURCE\\_FOR\\_DIRECT\\_QUERY and disclosure root AE\n* [[SPARK-42664]](https://issues.apache.org/jira/browse/SPARK-42664) [SC-139769][CONNECT] Support `bloomFilter` function for `DataFrameStatFunctions`\n* [[SPARK-43567]](https://issues.apache.org/jira/browse/SPARK-43567) [SC-139227][PS] Support `use_na_sentinel` for `factorize`\n* [[SPARK-44859]](https://issues.apache.org/jira/browse/SPARK-44859) [SC-140254][SS] Fix incorrect property name in structured streaming doc\n* [[SPARK-44822]](https://issues.apache.org/jira/browse/SPARK-44822) [SC-140182][PYTHON][SQL] Make Python UDTFs by default non-deterministic\n* [[SPARK-44731]](https://issues.apache.org/jira/browse/SPARK-44731) [SC-139524][PYTHON][CONNECT] Make TimestampNTZ works with literals in Python Spark Connect\n* [[SPARK-44836]](https://issues.apache.org/jira/browse/SPARK-44836) [SC-140180][PYTHON] Refactor Arrow Python UDTF\n* [[SPARK-44714]](https://issues.apache.org/jira/browse/SPARK-44714) [SC-139238] Ease restriction of LCA resolution regarding queries with having\n* [[SPARK-44749]](https://issues.apache.org/jira/browse/SPARK-44749) [SC-139664][SQL][PYTHON] Support named arguments in Python UDTF\n* [[SPARK-44737]](https://issues.apache.org/jira/browse/SPARK-44737) [SC-139512][SQL][UI] Should not display json format errors on SQL page for non-SparkThrowables on SQL Tab\n* [[SPARK-44665]](https://issues.apache.org/jira/browse/SPARK-44665) [SC-139307][PYTHON] Add support for pandas DataFrame assertDataFrameEqual\n* [[SPARK-44736]](https://issues.apache.org/jira/browse/SPARK-44736) [SC-139622][CONNECT] Add Dataset.explode to Spark Connect Scala Client\n* [[SPARK-44732]](https://issues.apache.org/jira/browse/SPARK-44732) [SC-139422][SQL] Built-in XML data source support\n* [[SPARK-44694]](https://issues.apache.org/jira/browse/SPARK-44694) [SC-139213][PYTHON][CONNECT] Refactor active sessions and expose them as an API\n* [[SPARK-44652]](https://issues.apache.org/jira/browse/SPARK-44652) [SC-138881] Raise error when only one df is None\n* [[SPARK-44562]](https://issues.apache.org/jira/browse/SPARK-44562) [SC-138824][SQL] Add OptimizeOneRowRelationSubquery in batch of Subquery\n* [[SPARK-44717]](https://issues.apache.org/jira/browse/SPARK-44717) [SC-139319][PYTHON][PS] Respect TimestampNTZ in resampling\n* [[SPARK-42849]](https://issues.apache.org/jira/browse/SPARK-42849) [SC-139365] [SQL] Session Variables\n* [[SPARK-44236]](https://issues.apache.org/jira/browse/SPARK-44236) [SC-139239][SQL] Disable WholeStageCodegen when set `spark.sql.codegen.factoryMode` to NO\\_CODEGEN\n* [[SPARK-44695]](https://issues.apache.org/jira/browse/SPARK-44695) [SC-139316][PYTHON] Improve error message for `DataFrame.toDF`\n* [[SPARK-44680]](https://issues.apache.org/jira/browse/SPARK-44680) [SC-139234][SQL] Improve the error for parameters in `DEFAULT`\n* [[SPARK-43402]](https://issues.apache.org/jira/browse/SPARK-43402) [SC-138321][SQL] FileSourceScanExec supports push down data filter with scalar subquery\n* [[SPARK-44641]](https://issues.apache.org/jira/browse/SPARK-44641) [SC-139216][SQL] Incorrect result in certain scenarios when SPJ is not triggered\n* [[SPARK-44689]](https://issues.apache.org/jira/browse/SPARK-44689) [SC-139219][CONNECT] Make the exception handling of function `SparkConnectPlanner#unpackScalarScalaUDF` more universal\n* [[SPARK-41636]](https://issues.apache.org/jira/browse/SPARK-41636) [SC-139061][SQL] Make sure `selectFilters` returns predicates in deterministic order\n* [[SPARK-44132]](https://issues.apache.org/jira/browse/SPARK-44132) [SC-139197][SQL] Materialize `Stream` of join column names to avoid codegen failure\n* [[SPARK-42330]](https://issues.apache.org/jira/browse/SPARK-42330) [SC-138838][SQL] Assign the name `RULE_ID_NOT_FOUND` to the error class `_LEGACY_ERROR_TEMP_2175`\n* [[SPARK-44683]](https://issues.apache.org/jira/browse/SPARK-44683) [SC-139214][SS] Logging level isn\u2019t passed to RocksDB state store provider correctly\n* [[SPARK-44658]](https://issues.apache.org/jira/browse/SPARK-44658) [SC-138868][CORE] `ShuffleStatus.getMapStatus` should return `None` instead of `Some(null)`\n* [[SPARK-44603]](https://issues.apache.org/jira/browse/SPARK-44603) [SC-138353] Add pyspark.testing to setup.py\n* [[SPARK-44252]](https://issues.apache.org/jira/browse/SPARK-44252) [SC-137505][SS] Define a new error class and apply for the case where loading state from DFS fails\n* [[SPARK-29497]](https://issues.apache.org/jira/browse/SPARK-29497) [DBRRM-396][SC-138477][CONNECT] Throw error when UDF is not deserializable.\n* [[SPARK-44624]](https://issues.apache.org/jira/browse/SPARK-44624) [DBRRM-396][SC-138900][CONNECT] Retry ExecutePlan in case initial request didn\u2019t reach server\n* [[SPARK-41400]](https://issues.apache.org/jira/browse/SPARK-41400) [DBRRM-396][SC-138287][CONNECT] Remove Connect Client Catalyst Dependency\n* [[SPARK-44059]](https://issues.apache.org/jira/browse/SPARK-44059) [SC-138833][SQL] Add better error messages for SQL named argumnts\n* [[SPARK-44620]](https://issues.apache.org/jira/browse/SPARK-44620) [SC-138831][SQL][PS][CONNECT] Make `ResolvePivot` retain the `Plan_ID_TAG`\n* [[SPARK-43838]](https://issues.apache.org/jira/browse/SPARK-43838) [SC-137413][SQL] Fix subquery on single table with having clause can\u2019t be optimized\n* [[SPARK-44555]](https://issues.apache.org/jira/browse/SPARK-44555) [SC-138820][SQL] Use checkError() to check Exception in command Suite & assign some error class names\n* [[SPARK-44280]](https://issues.apache.org/jira/browse/SPARK-44280) [SC-138821][SQL] Add convertJavaTimestampToTimestamp in JDBCDialect API\n* [[SPARK-44602]](https://issues.apache.org/jira/browse/SPARK-44602) [SC-138337][SQL][CONNECT][PS] Make `WidenSetOperationTypes` retain the `Plan_ID_TAG`\n* [[SPARK-42941]](https://issues.apache.org/jira/browse/SPARK-42941) [SC-138389][SS][CONNECT] Python StreamingQueryListener\n* [[SPARK-43838]](https://issues.apache.org/jira/browse/SPARK-43838) Revert \u201c[SC-137413][SQL] Fix subquery on single ta\u2026\n* [[SPARK-44538]](https://issues.apache.org/jira/browse/SPARK-44538) [SC-138178][CONNECT][SQL] Reinstate Row.jsonValue and friends\n* [[SPARK-44421]](https://issues.apache.org/jira/browse/SPARK-44421) [SC-138434][SPARK-44423][CONNECT] Reattachable execution in Spark Connect\n* [[SPARK-43838]](https://issues.apache.org/jira/browse/SPARK-43838) [SC-137413][SQL] Fix subquery on single table with having clause can\u2019t be optimized\n* [[SPARK-44587]](https://issues.apache.org/jira/browse/SPARK-44587) [SC-138315][SQL][CONNECT] Increase protobuf marshaller recursion limit\n* [[SPARK-44605]](https://issues.apache.org/jira/browse/SPARK-44605) [SC-138338][CORE] Refine internal ShuffleWriteProcessor API\n* [[SPARK-44394]](https://issues.apache.org/jira/browse/SPARK-44394) [SC-138291][CONNECT][WEBUI] Add a Spark UI page for Spark Connect\n* [[SPARK-44585]](https://issues.apache.org/jira/browse/SPARK-44585) [SC-138286][MLLIB] Fix warning condition in MLLib RankingMetrics ndcgAk\n* [[SPARK-44198]](https://issues.apache.org/jira/browse/SPARK-44198) [SC-138137][CORE] Support propagation of the log level to the executors\n* [[SPARK-44454]](https://issues.apache.org/jira/browse/SPARK-44454) [SC-138071][SQL][HIVE] HiveShim getTablesByType support fallback\n* [[SPARK-44425]](https://issues.apache.org/jira/browse/SPARK-44425) [SC-138135][CONNECT] Validate that user provided sessionId is an UUID\n* [[SPARK-43611]](https://issues.apache.org/jira/browse/SPARK-43611) [SC-138051][SQL][PS][CONNCECT] Make `ExtractWindowExpressions` retain the `PLAN_ID_TAG`\n* [[SPARK-44560]](https://issues.apache.org/jira/browse/SPARK-44560) [SC-138117][PYTHON][CONNECT] Improve tests and documentation for Arrow Python UDF\n* [[SPARK-44482]](https://issues.apache.org/jira/browse/SPARK-44482) [SC-138067][CONNECT] Connect server should can specify the bind address\n* [[SPARK-44528]](https://issues.apache.org/jira/browse/SPARK-44528) [SC-138047][CONNECT] Support proper usage of hasattr() for Connect dataframe\n* [[SPARK-44525]](https://issues.apache.org/jira/browse/SPARK-44525) [SC-138043][SQL] Improve error message when Invoke method is not found\n* [[SPARK-44355]](https://issues.apache.org/jira/browse/SPARK-44355) [SC-137878][SQL] Move WithCTE into command queries\n\n", "chunk_id": "3d014142d5436569874c00df2ada6525", "url": "https://docs.databricks.com/release-notes/runtime/14.1.html"} +{"chunked_text": "# Databricks release notes\n## Databricks Runtime release notes versions and compatibility\n#### Databricks Runtime 14.1\n##### Databricks ODBC/JDBC driver support\n\nDatabricks supports ODBC/JDBC drivers released in the past 2 years. Please download the recently released drivers and\nupgrade ([download ODBC](https://databricks.com/spark/odbc-driver-download), [download JDBC](https://docs.databricks.com/release-notes/runtime/ https:/www.databricks.com/spark/jdbc-drivers-download)).\n\n", "chunk_id": "aaa96bbc0bfd007064bed157872beb09", "url": "https://docs.databricks.com/release-notes/runtime/14.1.html"} +{"chunked_text": "# Databricks release notes\n## Databricks Runtime release notes versions and compatibility\n#### Databricks Runtime 14.1\n##### System environment\n\n* **Operating System**: Ubuntu 22.04.3 LTS\n* **Java**: Zulu 8.72.0.17-CA-linux64\n* **Scala**: 2.12.15\n* **Python**: 3.10.12\n* **R**: 4.3.1\n* **Delta Lake**: 3.0.0 \n### Installed Python libraries \n| Library | Version | Library | Version | Library | Version |\n| --- | --- | --- | --- | --- | --- |\n| anyio | 3.5.0 | argon2-cffi | 21.3.0 | argon2-cffi-bindings | 21.2.0 |\n| asttokens | 2.0.5 | attrs | 22.1.0 | backcall | 0.2.0 |\n| beautifulsoup4 | 4.11.1 | black | 22.6.0 | bleach | 4.1.0 |\n| blinker | 1.4 | boto3 | 1.24.28 | botocore | 1.27.96 |\n| certifi | 2022.12.7 | cffi | 1.15.1 | chardet | 4.0.0 |\n| charset-normalizer | 2.0.4 | click | 8.0.4 | comm | 0.1.2 |\n| contourpy | 1.0.5 | cryptography | 39.0.1 | cycler | 0.11.0 |\n| Cython | 0.29.32 | databricks-sdk | 0.1.6 | dbus-python | 1.2.18 |\n| debugpy | 1.6.7 | decorator | 5.1.1 | defusedxml | 0.7.1 |\n| distlib | 0.3.7 | docstring-to-markdown | 0.11 | entrypoints | 0.4 |\n| executing | 0.8.3 | facets-overview | 1.1.1 | fastjsonschema | 2.18.0 |\n| filelock | 3.12.3 | fonttools | 4.25.0 | GCC runtime library | 1.10.0 |\n| googleapis-common-protos | 1.60.0 | grpcio | 1.48.2 | grpcio-status | 1.48.1 |\n| httplib2 | 0.20.2 | idna | 3.4 | importlib-metadata | 4.6.4 |\n| ipykernel | 6.25.0 | ipython | 8.14.0 | ipython-genutils | 0.2.0 |\n| ipywidgets | 7.7.2 | jedi | 0.18.1 | jeepney | 0.7.1 |\n| Jinja2 | 3.1.2 | jmespath | 0.10.0 | joblib | 1.2.0 |\n| jsonschema | 4.17.3 | jupyter-client | 7.3.4 | jupyter-server | 1.23.4 |\n| jupyter\\_core | 5.2.0 | jupyterlab-pygments | 0.1.2 | jupyterlab-widgets | 1.0.0 |\n| keyring | 23.5.0 | kiwisolver | 1.4.4 | launchpadlib | 1.10.16 |\n| lazr.restfulclient | 0.14.4 | lazr.uri | 1.0.6 | lxml | 4.9.1 |\n| MarkupSafe | 2.1.1 | matplotlib | 3.7.0 | matplotlib-inline | 0.1.6 |\n| mccabe | 0.7.0 | mistune | 0.8.4 | more-itertools | 8.10.0 |\n| mypy-extensions | 0.4.3 | nbclassic | 0.5.2 | nbclient | 0.5.13 |\n| nbconvert | 6.5.4 | nbformat | 5.7.0 | nest-asyncio | 1.5.6 |\n| nodeenv | 1.8.0 | notebook | 6.5.2 | notebook\\_shim | 0.2.2 |\n| numpy | 1.23.5 | oauthlib | 3.2.0 | packaging | 22.0 |\n| pandas | 1.5.3 | pandocfilters | 1.5.0 | parso | 0.8.3 |\n| pathspec | 0.10.3 | patsy | 0.5.3 | pexpect | 4.8.0 |\n| pickleshare | 0.7.5 | Pillow | 9.4.0 | pip | 22.3.1 |\n| platformdirs | 2.5.2 | plotly | 5.9.0 | pluggy | 1.0.0 |\n| prometheus-client | 0.14.1 | prompt-toolkit | 3.0.36 | protobuf | 4.24.0 |\n| psutil | 5.9.0 | psycopg2 | 2.9.3 | ptyprocess | 0.7.0 |\n| pure-eval | 0.2.2 | pyarrow | 8.0.0 | pycparser | 2.21 |\n| pydantic | 1.10.6 | pyflakes | 3.0.1 | Pygments | 2.11.2 |\n| PyGObject | 3.42.1 | PyJWT | 2.3.0 | pyodbc | 4.0.32 |\n| pyparsing | 3.0.9 | pyright | 1.1.294 | pyrsistent | 0.18.0 |\n| python-dateutil | 2.8.2 | python-lsp-jsonrpc | 1.0.0 | python-lsp-server | 1.7.1 |\n| pytoolconfig | 1.2.5 | pytz | 2022.7 | pyzmq | 23.2.0 |\n| requests | 2.28.1 | rope | 1.7.0 | s3transfer | 0.6.2 |\n| scikit-learn | 1.1.1 | seaborn | 0.12.2 | SecretStorage | 3.3.1 |\n| Send2Trash | 1.8.0 | setuptools | 65.6.3 | six | 1.16.0 |\n| sniffio | 1.2.0 | soupsieve | 2.3.2.post1 | ssh-import-id | 5.11 |\n| stack-data | 0.2.0 | statsmodels | 0.13.5 | tenacity | 8.1.0 |\n| terminado | 0.17.1 | threadpoolctl | 2.2.0 | tinycss2 | 1.2.1 |\n| tokenize-rt | 4.2.1 | tomli | 2.0.1 | tornado | 6.1 |\n| traitlets | 5.7.1 | typing\\_extensions | 4.4.0 | ujson | 5.4.0 |\n| unattended-upgrades | 0.1 | urllib3 | 1.26.14 | virtualenv | 20.16.7 |\n| wadllib | 1.3.6 | wcwidth | 0.2.5 | webencodings | 0.5.1 |\n| websocket-client | 0.58.0 | whatthepatch | 1.0.2 | wheel | 0.38.4 |\n| widgetsnbextension | 3.6.1 | yapf | 0.31.0 | zipp | 1.0.0 |\n| SciPy | 1.10.1 | | | | | \n### Installed R libraries \nR libraries are installed from the Posit Package Manager CRAN snapshot on 2023-02-10. \n| Library | Version | Library | Version | Library | Version |\n| --- | --- | --- | --- | --- | --- |\n| arrow | 12.0.1 | askpass | 1.1 | assertthat | 0.2.1 |\n| backports | 1.4.1 | base | 4.3.1 | base64enc | 0.1-3 |\n| bit | 4.0.5 | bit64 | 4.0.5 | blob | 1.2.4 |\n| boot | 1.3-28 | brew | 1.0-8 | brio | 1.1.3 |\n| broom | 1.0.5 | bslib | 0.5.0 | cachem | 1.0.8 |\n| callr | 3.7.3 | caret | 6.0-94 | cellranger | 1.1.0 |\n| chron | 2.3-61 | class | 7.3-22 | cli | 3.6.1 |\n| clipr | 0.8.0 | clock | 0.7.0 | cluster | 2.1.4 |\n| codetools | 0.2-19 | colorspace | 2.1-0 | commonmark | 1.9.0 |\n| compiler | 4.3.1 | config | 0.3.1 | conflicted | 1.2.0 |\n| cpp11 | 0.4.4 | crayon | 1.5.2 | credentials | 1.3.2 |\n| curl | 5.0.1 | data.table | 1.14.8 | datasets | 4.3.1 |\n| DBI | 1.1.3 | dbplyr | 2.3.3 | desc | 1.4.2 |\n| devtools | 2.4.5 | diagram | 1.6.5 | diffobj | 0.3.5 |\n| digest | 0.6.33 | downlit | 0.4.3 | dplyr | 1.1.2 |\n| dtplyr | 1.3.1 | e1071 | 1.7-13 | ellipsis | 0.3.2 |\n| evaluate | 0.21 | fansi | 1.0.4 | farver | 2.1.1 |\n| fastmap | 1.1.1 | fontawesome | 0.5.1 | forcats | 1.0.0 |\n| foreach | 1.5.2 | foreign | 0.8-82 | forge | 0.2.0 |\n| fs | 1.6.2 | future | 1.33.0 | future.apply | 1.11.0 |\n| gargle | 1.5.1 | generics | 0.1.3 | gert | 1.9.2 |\n| ggplot2 | 3.4.2 | gh | 1.4.0 | gitcreds | 0.1.2 |\n| glmnet | 4.1-7 | globals | 0.16.2 | glue | 1.6.2 |\n| googledrive | 2.1.1 | googlesheets4 | 1.1.1 | gower | 1.0.1 |\n| graphics | 4.3.1 | grDevices | 4.3.1 | grid | 4.3.1 |\n| gridExtra | 2.3 | gsubfn | 0.7 | gtable | 0.3.3 |\n| hardhat | 1.3.0 | haven | 2.5.3 | highr | 0.10 |\n| hms | 1.1.3 | htmltools | 0.5.5 | htmlwidgets | 1.6.2 |\n| httpuv | 1.6.11 | httr | 1.4.6 | httr2 | 0.2.3 |\n| ids | 1.0.1 | ini | 0.3.1 | ipred | 0.9-14 |\n| isoband | 0.2.7 | iterators | 1.0.14 | jquerylib | 0.1.4 |\n| jsonlite | 1.8.7 | KernSmooth | 2.23-21 | knitr | 1.43 |\n| labeling | 0.4.2 | later | 1.3.1 | lattice | 0.21-8 |\n| lava | 1.7.2.1 | lifecycle | 1.0.3 | listenv | 0.9.0 |\n| lubridate | 1.9.2 | magrittr | 2.0.3 | markdown | 1.7 |\n| MASS | 7.3-60 | Matrix | 1.5-4.1 | memoise | 2.0.1 |\n| methods | 4.3.1 | mgcv | 1.8-42 | mime | 0.12 |\n| miniUI | 0.1.1.1 | ModelMetrics | 1.2.2.2 | modelr | 0.1.11 |\n| munsell | 0.5.0 | nlme | 3.1-162 | nnet | 7.3-19 |\n| numDeriv | 2016.8-1.1 | openssl | 2.0.6 | parallel | 4.3.1 |\n| parallelly | 1.36.0 | pillar | 1.9.0 | pkgbuild | 1.4.2 |\n| pkgconfig | 2.0.3 | pkgdown | 2.0.7 | pkgload | 1.3.2.1 |\n| plogr | 0.2.0 | plyr | 1.8.8 | praise | 1.0.0 |\n| prettyunits | 1.1.1 | pROC | 1.18.4 | processx | 3.8.2 |\n| prodlim | 2023.03.31 | profvis | 0.3.8 | progress | 1.2.2 |\n| progressr | 0.13.0 | promises | 1.2.0.1 | proto | 1.0.0 |\n| proxy | 0.4-27 | ps | 1.7.5 | purrr | 1.0.1 |\n| r2d3 | 0.2.6 | R6 | 2.5.1 | ragg | 1.2.5 |\n| randomForest | 4.7-1.1 | rappdirs | 0.3.3 | rcmdcheck | 1.4.0 |\n| RColorBrewer | 1.1-3 | Rcpp | 1.0.11 | RcppEigen | 0.3.3.9.3 |\n| readr | 2.1.4 | readxl | 1.4.3 | recipes | 1.0.6 |\n| rematch | 1.0.1 | rematch2 | 2.1.2 | remotes | 2.4.2 |\n| reprex | 2.0.2 | reshape2 | 1.4.4 | rlang | 1.1.1 |\n| rmarkdown | 2.23 | RODBC | 1.3-20 | roxygen2 | 7.2.3 |\n| rpart | 4.1.19 | rprojroot | 2.0.3 | Rserve | 1.8-11 |\n| RSQLite | 2.3.1 | rstudioapi | 0.15.0 | rversions | 2.1.2 |\n| rvest | 1.0.3 | sass | 0.4.6 | scales | 1.2.1 |\n| selectr | 0.4-2 | sessioninfo | 1.2.2 | shape | 1.4.6 |\n| shiny | 1.7.4.1 | sourcetools | 0.1.7-1 | sparklyr | 1.8.1 |\n| SparkR | 3.5.0 | spatial | 7.3-15 | splines | 4.3.1 |\n| sqldf | 0.4-11 | SQUAREM | 2021.1 | stats | 4.3.1 |\n| stats4 | 4.3.1 | stringi | 1.7.12 | stringr | 1.5.0 |\n| survival | 3.5-5 | sys | 3.4.2 | systemfonts | 1.0.4 |\n| tcltk | 4.3.1 | testthat | 3.1.10 | textshaping | 0.3.6 |\n| tibble | 3.2.1 | tidyr | 1.3.0 | tidyselect | 1.2.0 |\n| tidyverse | 2.0.0 | timechange | 0.2.0 | timeDate | 4022.108 |\n| tinytex | 0.45 | tools | 4.3.1 | tzdb | 0.4.0 |\n| urlchecker | 1.0.1 | usethis | 2.2.2 | utf8 | 1.2.3 |\n| utils | 4.3.1 | uuid | 1.1-0 | vctrs | 0.6.3 |\n| viridisLite | 0.4.2 | vroom | 1.6.3 | waldo | 0.5.1 |\n| whisker | 0.4.1 | withr | 2.5.0 | xfun | 0.39 |\n| xml2 | 1.3.5 | xopen | 1.0.0 | xtable | 1.8-4 |\n| yaml | 2.3.7 | zip | 2.3.0 | | | \n### Installed Java and Scala libraries (Scala 2.12 cluster version) \n| Group ID | Artifact ID | Version |\n| --- | --- | --- |\n| antlr | antlr | 2.7.7 |\n| com.amazonaws | amazon-kinesis-client | 1.12.0 |\n| com.amazonaws | aws-java-sdk-autoscaling | 1.12.390 |\n| com.amazonaws | aws-java-sdk-cloudformation | 1.12.390 |\n| com.amazonaws | aws-java-sdk-cloudfront | 1.12.390 |\n| com.amazonaws | aws-java-sdk-cloudhsm | 1.12.390 |\n| com.amazonaws | aws-java-sdk-cloudsearch | 1.12.390 |\n| com.amazonaws | aws-java-sdk-cloudtrail | 1.12.390 |\n| com.amazonaws | aws-java-sdk-cloudwatch | 1.12.390 |\n| com.amazonaws | aws-java-sdk-cloudwatchmetrics | 1.12.390 |\n| com.amazonaws | aws-java-sdk-codedeploy | 1.12.390 |\n| com.amazonaws | aws-java-sdk-cognitoidentity | 1.12.390 |\n| com.amazonaws | aws-java-sdk-cognitosync | 1.12.390 |\n| com.amazonaws | aws-java-sdk-config | 1.12.390 |\n| com.amazonaws | aws-java-sdk-core | 1.12.390 |\n| com.amazonaws | aws-java-sdk-datapipeline | 1.12.390 |\n| com.amazonaws | aws-java-sdk-directconnect | 1.12.390 |\n| com.amazonaws | aws-java-sdk-directory | 1.12.390 |\n| com.amazonaws | aws-java-sdk-dynamodb | 1.12.390 |\n| com.amazonaws | aws-java-sdk-ec2 | 1.12.390 |\n| com.amazonaws | aws-java-sdk-ecs | 1.12.390 |\n| com.amazonaws | aws-java-sdk-efs | 1.12.390 |\n| com.amazonaws | aws-java-sdk-elasticache | 1.12.390 |\n| com.amazonaws | aws-java-sdk-elasticbeanstalk | 1.12.390 |\n| com.amazonaws | aws-java-sdk-elasticloadbalancing | 1.12.390 |\n| com.amazonaws | aws-java-sdk-elastictranscoder | 1.12.390 |\n| com.amazonaws | aws-java-sdk-emr | 1.12.390 |\n| com.amazonaws | aws-java-sdk-glacier | 1.12.390 |\n| com.amazonaws | aws-java-sdk-glue | 1.12.390 |\n| com.amazonaws | aws-java-sdk-iam | 1.12.390 |\n| com.amazonaws | aws-java-sdk-importexport | 1.12.390 |\n| com.amazonaws | aws-java-sdk-kinesis | 1.12.390 |\n| com.amazonaws | aws-java-sdk-kms | 1.12.390 |\n| com.amazonaws | aws-java-sdk-lambda | 1.12.390 |\n| com.amazonaws | aws-java-sdk-logs | 1.12.390 |\n| com.amazonaws | aws-java-sdk-machinelearning | 1.12.390 |\n| com.amazonaws | aws-java-sdk-opsworks | 1.12.390 |\n| com.amazonaws | aws-java-sdk-rds | 1.12.390 |\n| com.amazonaws | aws-java-sdk-redshift | 1.12.390 |\n| com.amazonaws | aws-java-sdk-route53 | 1.12.390 |\n| com.amazonaws | aws-java-sdk-s3 | 1.12.390 |\n| com.amazonaws | aws-java-sdk-ses | 1.12.390 |\n| com.amazonaws | aws-java-sdk-simpledb | 1.12.390 |\n| com.amazonaws | aws-java-sdk-simpleworkflow | 1.12.390 |\n| com.amazonaws | aws-java-sdk-sns | 1.12.390 |\n| com.amazonaws | aws-java-sdk-sqs | 1.12.390 |\n| com.amazonaws | aws-java-sdk-ssm | 1.12.390 |\n| com.amazonaws | aws-java-sdk-storagegateway | 1.12.390 |\n| com.amazonaws | aws-java-sdk-sts | 1.12.390 |\n| com.amazonaws | aws-java-sdk-support | 1.12.390 |\n| com.amazonaws | aws-java-sdk-swf-libraries | 1.11.22 |\n| com.amazonaws | aws-java-sdk-workspaces | 1.12.390 |\n| com.amazonaws | jmespath-java | 1.12.390 |\n| com.clearspring.analytics | stream | 2.9.6 |\n| com.databricks | Rserve | 1.8-3 |\n| com.databricks | databricks-sdk-java | 0.7.0 |\n| com.databricks | jets3t | 0.7.1-0 |\n| com.databricks.scalapb | compilerplugin\\_2.12 | 0.4.15-10 |\n| com.databricks.scalapb | scalapb-runtime\\_2.12 | 0.4.15-10 |\n| com.esotericsoftware | kryo-shaded | 4.0.2 |\n| com.esotericsoftware | minlog | 1.3.0 |\n| com.fasterxml | classmate | 1.3.4 |\n| com.fasterxml.jackson.core | jackson-annotations | 2.15.2 |\n| com.fasterxml.jackson.core | jackson-core | 2.15.2 |\n| com.fasterxml.jackson.core | jackson-databind | 2.15.2 |\n| com.fasterxml.jackson.dataformat | jackson-dataformat-cbor | 2.15.2 |\n| com.fasterxml.jackson.datatype | jackson-datatype-joda | 2.15.2 |\n| com.fasterxml.jackson.datatype | jackson-datatype-jsr310 | 2.15.1 |\n| com.fasterxml.jackson.module | jackson-module-paranamer | 2.15.2 |\n| com.fasterxml.jackson.module | jackson-module-scala\\_2.12 | 2.15.2 |\n| com.github.ben-manes.caffeine | caffeine | 2.9.3 |\n| com.github.fommil | jniloader | 1.1 |\n| com.github.fommil.netlib | native\\_ref-java | 1.1 |\n| com.github.fommil.netlib | native\\_ref-java | 1.1-natives |\n| com.github.fommil.netlib | native\\_system-java | 1.1 |\n| com.github.fommil.netlib | native\\_system-java | 1.1-natives |\n| com.github.fommil.netlib | netlib-native\\_ref-linux-x86\\_64 | 1.1-natives |\n| com.github.fommil.netlib | netlib-native\\_system-linux-x86\\_64 | 1.1-natives |\n| com.github.luben | zstd-jni | 1.5.5-4 |\n| com.github.wendykierp | JTransforms | 3.1 |\n| com.google.code.findbugs | jsr305 | 3.0.0 |\n| com.google.code.gson | gson | 2.10.1 |\n| com.google.crypto.tink | tink | 1.9.0 |\n| com.google.errorprone | error\\_prone\\_annotations | 2.10.0 |\n| com.google.flatbuffers | flatbuffers-java | 1.12.0 |\n| com.google.guava | guava | 15.0 |\n| com.google.protobuf | protobuf-java | 2.6.1 |\n| com.helger | profiler | 1.1.1 |\n| com.jcraft | jsch | 0.1.55 |\n| com.jolbox | bonecp | 0.8.0.RELEASE |\n| com.lihaoyi | sourcecode\\_2.12 | 0.1.9 |\n| com.microsoft.azure | azure-data-lake-store-sdk | 2.3.9 |\n| com.microsoft.sqlserver | mssql-jdbc | 11.2.2.jre8 |\n| com.ning | compress-lzf | 1.1.2 |\n| com.sun.mail | javax.mail | 1.5.2 |\n| com.sun.xml.bind | jaxb-core | 2.2.11 |\n| com.sun.xml.bind | jaxb-impl | 2.2.11 |\n| com.tdunning | json | 1.8 |\n| com.thoughtworks.paranamer | paranamer | 2.8 |\n| com.trueaccord.lenses | lenses\\_2.12 | 0.4.12 |\n| com.twitter | chill-java | 0.10.0 |\n| com.twitter | chill\\_2.12 | 0.10.0 |\n| com.twitter | util-app\\_2.12 | 7.1.0 |\n| com.twitter | util-core\\_2.12 | 7.1.0 |\n| com.twitter | util-function\\_2.12 | 7.1.0 |\n| com.twitter | util-jvm\\_2.12 | 7.1.0 |\n| com.twitter | util-lint\\_2.12 | 7.1.0 |\n| com.twitter | util-registry\\_2.12 | 7.1.0 |\n| com.twitter | util-stats\\_2.12 | 7.1.0 |\n| com.typesafe | config | 1.2.1 |\n| com.typesafe.scala-logging | scala-logging\\_2.12 | 3.7.2 |\n| com.uber | h3 | 3.7.3 |\n| com.univocity | univocity-parsers | 2.9.1 |\n| com.zaxxer | HikariCP | 4.0.3 |\n| commons-cli | commons-cli | 1.5.0 |\n| commons-codec | commons-codec | 1.16.0 |\n| commons-collections | commons-collections | 3.2.2 |\n| commons-dbcp | commons-dbcp | 1.4 |\n| commons-fileupload | commons-fileupload | 1.5 |\n| commons-httpclient | commons-httpclient | 3.1 |\n| commons-io | commons-io | 2.13.0 |\n| commons-lang | commons-lang | 2.6 |\n| commons-logging | commons-logging | 1.1.3 |\n| commons-pool | commons-pool | 1.5.4 |\n| dev.ludovic.netlib | arpack | 3.0.3 |\n| dev.ludovic.netlib | blas | 3.0.3 |\n| dev.ludovic.netlib | lapack | 3.0.3 |\n| info.ganglia.gmetric4j | gmetric4j | 1.0.10 |\n| io.airlift | aircompressor | 0.25 |\n| io.delta | delta-sharing-spark\\_2.12 | 0.7.5 |\n| io.dropwizard.metrics | metrics-annotation | 4.2.19 |\n| io.dropwizard.metrics | metrics-core | 4.2.19 |\n| io.dropwizard.metrics | metrics-graphite | 4.2.19 |\n| io.dropwizard.metrics | metrics-healthchecks | 4.2.19 |\n| io.dropwizard.metrics | metrics-jetty9 | 4.2.19 |\n| io.dropwizard.metrics | metrics-jmx | 4.2.19 |\n| io.dropwizard.metrics | metrics-json | 4.2.19 |\n| io.dropwizard.metrics | metrics-jvm | 4.2.19 |\n| io.dropwizard.metrics | metrics-servlets | 4.2.19 |\n| io.netty | netty-all | 4.1.96.Final |\n| io.netty | netty-buffer | 4.1.96.Final |\n| io.netty | netty-codec | 4.1.96.Final |\n| io.netty | netty-codec-http | 4.1.96.Final |\n| io.netty | netty-codec-http2 | 4.1.96.Final |\n| io.netty | netty-codec-socks | 4.1.96.Final |\n| io.netty | netty-common | 4.1.96.Final |\n| io.netty | netty-handler | 4.1.96.Final |\n| io.netty | netty-handler-proxy | 4.1.96.Final |\n| io.netty | netty-resolver | 4.1.96.Final |\n| io.netty | netty-tcnative-boringssl-static | 2.0.61.Final-linux-aarch\\_64 |\n| io.netty | netty-tcnative-boringssl-static | 2.0.61.Final-linux-x86\\_64 |\n| io.netty | netty-tcnative-boringssl-static | 2.0.61.Final-osx-aarch\\_64 |\n| io.netty | netty-tcnative-boringssl-static | 2.0.61.Final-osx-x86\\_64 |\n| io.netty | netty-tcnative-boringssl-static | 2.0.61.Final-windows-x86\\_64 |\n| io.netty | netty-tcnative-classes | 2.0.61.Final |\n| io.netty | netty-transport | 4.1.96.Final |\n| io.netty | netty-transport-classes-epoll | 4.1.96.Final |\n| io.netty | netty-transport-classes-kqueue | 4.1.96.Final |\n| io.netty | netty-transport-native-epoll | 4.1.96.Final |\n| io.netty | netty-transport-native-epoll | 4.1.96.Final-linux-aarch\\_64 |\n| io.netty | netty-transport-native-epoll | 4.1.96.Final-linux-x86\\_64 |\n| io.netty | netty-transport-native-kqueue | 4.1.96.Final-osx-aarch\\_64 |\n| io.netty | netty-transport-native-kqueue | 4.1.96.Final-osx-x86\\_64 |\n| io.netty | netty-transport-native-unix-common | 4.1.96.Final |\n| io.prometheus | simpleclient | 0.7.0 |\n| io.prometheus | simpleclient\\_common | 0.7.0 |\n| io.prometheus | simpleclient\\_dropwizard | 0.7.0 |\n| io.prometheus | simpleclient\\_pushgateway | 0.7.0 |\n| io.prometheus | simpleclient\\_servlet | 0.7.0 |\n| io.prometheus.jmx | collector | 0.12.0 |\n| jakarta.annotation | jakarta.annotation-api | 1.3.5 |\n| jakarta.servlet | jakarta.servlet-api | 4.0.3 |\n| jakarta.validation | jakarta.validation-api | 2.0.2 |\n| jakarta.ws.rs | jakarta.ws.rs-api | 2.1.6 |\n| javax.activation | activation | 1.1.1 |\n| javax.el | javax.el-api | 2.2.4 |\n| javax.jdo | jdo-api | 3.0.1 |\n| javax.transaction | jta | 1.1 |\n| javax.transaction | transaction-api | 1.1 |\n| javax.xml.bind | jaxb-api | 2.2.11 |\n| javolution | javolution | 5.5.1 |\n| jline | jline | 2.14.6 |\n| joda-time | joda-time | 2.12.1 |\n| net.java.dev.jna | jna | 5.8.0 |\n| net.razorvine | pickle | 1.3 |\n| net.sf.jpam | jpam | 1.1 |\n| net.sf.opencsv | opencsv | 2.3 |\n| net.sf.supercsv | super-csv | 2.2.0 |\n| net.snowflake | snowflake-ingest-sdk | 0.9.6 |\n| net.snowflake | snowflake-jdbc | 3.13.33 |\n| net.sourceforge.f2j | arpack\\_combined\\_all | 0.1 |\n| org.acplt.remotetea | remotetea-oncrpc | 1.1.2 |\n| org.antlr | ST4 | 4.0.4 |\n| org.antlr | antlr-runtime | 3.5.2 |\n| org.antlr | antlr4-runtime | 4.9.3 |\n| org.antlr | stringtemplate | 3.2.1 |\n| org.apache.ant | ant | 1.9.16 |\n| org.apache.ant | ant-jsch | 1.9.16 |\n| org.apache.ant | ant-launcher | 1.9.16 |\n| org.apache.arrow | arrow-format | 12.0.1 |\n| org.apache.arrow | arrow-memory-core | 12.0.1 |\n| org.apache.arrow | arrow-memory-netty | 12.0.1 |\n| org.apache.arrow | arrow-vector | 12.0.1 |\n| org.apache.avro | avro | 1.11.2 |\n| org.apache.avro | avro-ipc | 1.11.2 |\n| org.apache.avro | avro-mapred | 1.11.2 |\n| org.apache.commons | commons-collections4 | 4.4 |\n| org.apache.commons | commons-compress | 1.23.0 |\n| org.apache.commons | commons-crypto | 1.1.0 |\n| org.apache.commons | commons-lang3 | 3.12.0 |\n| org.apache.commons | commons-math3 | 3.6.1 |\n| org.apache.commons | commons-text | 1.10.0 |\n| org.apache.curator | curator-client | 2.13.0 |\n| org.apache.curator | curator-framework | 2.13.0 |\n| org.apache.curator | curator-recipes | 2.13.0 |\n| org.apache.datasketches | datasketches-java | 3.1.0 |\n| org.apache.datasketches | datasketches-memory | 2.0.0 |\n| org.apache.derby | derby | 10.14.2.0 |\n| org.apache.hadoop | hadoop-client-runtime | 3.3.6 |\n| org.apache.hive | hive-beeline | 2.3.9 |\n| org.apache.hive | hive-cli | 2.3.9 |\n| org.apache.hive | hive-jdbc | 2.3.9 |\n| org.apache.hive | hive-llap-client | 2.3.9 |\n| org.apache.hive | hive-llap-common | 2.3.9 |\n| org.apache.hive | hive-serde | 2.3.9 |\n| org.apache.hive | hive-shims | 2.3.9 |\n| org.apache.hive | hive-storage-api | 2.8.1 |\n| org.apache.hive.shims | hive-shims-0.23 | 2.3.9 |\n| org.apache.hive.shims | hive-shims-common | 2.3.9 |\n| org.apache.hive.shims | hive-shims-scheduler | 2.3.9 |\n| org.apache.httpcomponents | httpclient | 4.5.14 |\n| org.apache.httpcomponents | httpcore | 4.4.16 |\n| org.apache.ivy | ivy | 2.5.1 |\n| org.apache.logging.log4j | log4j-1.2-api | 2.20.0 |\n| org.apache.logging.log4j | log4j-api | 2.20.0 |\n| org.apache.logging.log4j | log4j-core | 2.20.0 |\n| org.apache.logging.log4j | log4j-slf4j2-impl | 2.20.0 |\n| org.apache.mesos | mesos | 1.11.0-shaded-protobuf |\n| org.apache.orc | orc-core | 1.9.1-shaded-protobuf |\n| org.apache.orc | orc-mapreduce | 1.9.1-shaded-protobuf |\n| org.apache.orc | orc-shims | 1.9.1 |\n| org.apache.thrift | libfb303 | 0.9.3 |\n| org.apache.thrift | libthrift | 0.12.0 |\n| org.apache.ws.xmlschema | xmlschema-core | 2.3.0 |\n| org.apache.xbean | xbean-asm9-shaded | 4.23 |\n| org.apache.yetus | audience-annotations | 0.13.0 |\n| org.apache.zookeeper | zookeeper | 3.6.3 |\n| org.apache.zookeeper | zookeeper-jute | 3.6.3 |\n| org.checkerframework | checker-qual | 3.31.0 |\n| org.codehaus.jackson | jackson-core-asl | 1.9.13 |\n| org.codehaus.jackson | jackson-mapper-asl | 1.9.13 |\n| org.codehaus.janino | commons-compiler | 3.0.16 |\n| org.codehaus.janino | janino | 3.0.16 |\n| org.datanucleus | datanucleus-api-jdo | 4.2.4 |\n| org.datanucleus | datanucleus-core | 4.1.17 |\n| org.datanucleus | datanucleus-rdbms | 4.1.19 |\n| org.datanucleus | javax.jdo | 3.2.0-m3 |\n| org.eclipse.jetty | jetty-client | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-continuation | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-http | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-io | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-jndi | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-plus | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-proxy | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-security | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-server | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-servlet | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-servlets | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-util | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-util-ajax | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-webapp | 9.4.52.v20230823 |\n| org.eclipse.jetty | jetty-xml | 9.4.52.v20230823 |\n| org.eclipse.jetty.websocket | websocket-api | 9.4.52.v20230823 |\n| org.eclipse.jetty.websocket | websocket-client | 9.4.52.v20230823 |\n| org.eclipse.jetty.websocket | websocket-common | 9.4.52.v20230823 |\n| org.eclipse.jetty.websocket | websocket-server | 9.4.52.v20230823 |\n| org.eclipse.jetty.websocket | websocket-servlet | 9.4.52.v20230823 |\n| org.fusesource.leveldbjni | leveldbjni-all | 1.8 |\n| org.glassfish.hk2 | hk2-api | 2.6.1 |\n| org.glassfish.hk2 | hk2-locator | 2.6.1 |\n| org.glassfish.hk2 | hk2-utils | 2.6.1 |\n| org.glassfish.hk2 | osgi-resource-locator | 1.0.3 |\n| org.glassfish.hk2.external | aopalliance-repackaged | 2.6.1 |\n| org.glassfish.hk2.external | jakarta.inject | 2.6.1 |\n| org.glassfish.jersey.containers | jersey-container-servlet | 2.40 |\n| org.glassfish.jersey.containers | jersey-container-servlet-core | 2.40 |\n| org.glassfish.jersey.core | jersey-client | 2.40 |\n| org.glassfish.jersey.core | jersey-common | 2.40 |\n| org.glassfish.jersey.core | jersey-server | 2.40 |\n| org.glassfish.jersey.inject | jersey-hk2 | 2.40 |\n| org.hibernate.validator | hibernate-validator | 6.1.7.Final |\n| org.ini4j | ini4j | 0.5.4 |\n| org.javassist | javassist | 3.29.2-GA |\n| org.jboss.logging | jboss-logging | 3.3.2.Final |\n| org.jdbi | jdbi | 2.63.1 |\n| org.jetbrains | annotations | 17.0.0 |\n| org.joda | joda-convert | 1.7 |\n| org.jodd | jodd-core | 3.5.2 |\n| org.json4s | json4s-ast\\_2.12 | 3.7.0-M11 |\n| org.json4s | json4s-core\\_2.12 | 3.7.0-M11 |\n| org.json4s | json4s-jackson\\_2.12 | 3.7.0-M11 |\n| org.json4s | json4s-scalap\\_2.12 | 3.7.0-M11 |\n| org.lz4 | lz4-java | 1.8.0 |\n| org.mariadb.jdbc | mariadb-java-client | 2.7.9 |\n| org.mlflow | mlflow-spark | 2.2.0 |\n| org.objenesis | objenesis | 2.5.1 |\n| org.postgresql | postgresql | 42.6.0 |\n| org.roaringbitmap | RoaringBitmap | 0.9.45 |\n| org.roaringbitmap | shims | 0.9.45 |\n| org.rocksdb | rocksdbjni | 8.3.2 |\n| org.rosuda.REngine | REngine | 2.1.0 |\n| org.scala-lang | scala-compiler\\_2.12 | 2.12.15 |\n| org.scala-lang | scala-library\\_2.12 | 2.12.15 |\n| org.scala-lang | scala-reflect\\_2.12 | 2.12.15 |\n| org.scala-lang.modules | scala-collection-compat\\_2.12 | 2.9.0 |\n| org.scala-lang.modules | scala-parser-combinators\\_2.12 | 1.1.2 |\n| org.scala-lang.modules | scala-xml\\_2.12 | 1.2.0 |\n| org.scala-sbt | test-interface | 1.0 |\n| org.scalacheck | scalacheck\\_2.12 | 1.14.2 |\n| org.scalactic | scalactic\\_2.12 | 3.2.15 |\n| org.scalanlp | breeze-macros\\_2.12 | 2.1.0 |\n| org.scalanlp | breeze\\_2.12 | 2.1.0 |\n| org.scalatest | scalatest-compatible | 3.2.15 |\n| org.scalatest | scalatest-core\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-diagrams\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-featurespec\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-flatspec\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-freespec\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-funspec\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-funsuite\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-matchers-core\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-mustmatchers\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-propspec\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-refspec\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-shouldmatchers\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest-wordspec\\_2.12 | 3.2.15 |\n| org.scalatest | scalatest\\_2.12 | 3.2.15 |\n| org.slf4j | jcl-over-slf4j | 2.0.7 |\n| org.slf4j | jul-to-slf4j | 2.0.7 |\n| org.slf4j | slf4j-api | 2.0.7 |\n| org.threeten | threeten-extra | 1.7.1 |\n| org.tukaani | xz | 1.9 |\n| org.typelevel | algebra\\_2.12 | 2.0.1 |\n| org.typelevel | cats-kernel\\_2.12 | 2.1.1 |\n| org.typelevel | spire-macros\\_2.12 | 0.17.0 |\n| org.typelevel | spire-platform\\_2.12 | 0.17.0 |\n| org.typelevel | spire-util\\_2.12 | 0.17.0 |\n| org.typelevel | spire\\_2.12 | 0.17.0 |\n| org.wildfly.openssl | wildfly-openssl | 1.1.3.Final |\n| org.xerial | sqlite-jdbc | 3.42.0.0 |\n| org.xerial.snappy | snappy-java | 1.1.10.3 |\n| org.yaml | snakeyaml | 2.0 |\n| oro | oro | 2.0.8 |\n| pl.edu.icm | JLargeArrays | 1.5 |\n| software.amazon.cryptools | AmazonCorrettoCryptoProvider | 1.6.1-linux-x86\\_64 |\n| software.amazon.ion | ion-java | 1.0.2 |\n| stax | stax-api | 1.0.1 |\n\n", "chunk_id": "70117560fde7824e183f9b9267441f65", "url": "https://docs.databricks.com/release-notes/runtime/14.1.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `try_sum` aggregate function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 11.3 LTS and above \nReturns the sum calculated from values of a group, or NULL if there is an overflow.\n\n####### `try_sum` aggregate function\n######## Syntax\n\n```\ntry_sum ( [ALL | DISTINCT] expr ) [FILTER ( WHERE cond ) ]\n\n``` \nThis function can also be invoked as a [window function](https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html) using the `OVER` clause.\n\n####### `try_sum` aggregate function\n######## Arguments\n\n* `expr`: An expression that evaluates to a numeric or interval.\n* `cond`: An optional boolean expression filtering the rows used for aggregation.\n\n####### `try_sum` aggregate function\n######## Returns\n\nIf `expr` is an integral number type, a BIGINT. \nIf `expr` is `DECIMAL(p, s)` the result is `DECIMAL(p + min(10, 31-p), s)`. \nIf `expr` is an interval the result type matches `expr`. \nOtherwise, a DOUBLE. \nIf `DISTINCT` is specified only unique values are summed up. \nIf the result overflows the result type Databricks SQL returns NULL. To return an error instead use [sum](https://docs.databricks.com/sql/language-manual/functions/sum.html).\n\n", "chunk_id": "0ade9f657da95100934df5c13015f0c2", "url": "https://docs.databricks.com/sql/language-manual/functions/try_sum.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `try_sum` aggregate function\n######## Examples\n\n```\n> SELECT try_sum(col) FROM VALUES (5), (10), (15) AS tab(col);\n30\n\n> SELECT try_sum(col) FILTER(WHERE col <15)\nFROM VALUES (5), (10), (15) AS tab(col);\n15\n\n> SELECT try_sum(DISTINCT col) FROM VALUES (5), (10), (10), (15) AS tab(col);\n30\n\n> SELECT try_sum(col) FROM VALUES (NULL), (10), (15) AS tab(col);\n25\n\n> SELECT try_sum(col) FROM VALUES (NULL), (NULL) AS tab(col);\nNULL\n\n-- try_sum overflows a BIGINT\n> SELECT try_sum(c1) FROM VALUES(5E18::BIGINT), (5E18::BIGINT) AS tab(c1);\nNULL\n\n-- In ANSI mode sum returns an error if it overflows BIGINT\n> SELECT sum(c1) FROM VALUES(5E18::BIGINT), (5E18::BIGINT) AS tab(c1);\nERROR\n\n-- try_sum overflows an INTERVAL\n> SELECT try_sum(c1) FROM VALUES(INTERVAL '100000000' YEARS), (INTERVAL '100000000' YEARS) AS tab(c1);\nNULL\n\n-- sum returns an error on INTERVAL overflow\n> SELECT sum(c1) FROM VALUES(INTERVAL '100000000' YEARS), (INTERVAL '100000000' YEARS) AS tab(c1);\nError: ARITHMETIC_OVERFLOW\n\n```\n\n", "chunk_id": "30de50c0a6fb60a50000171c6d960a65", "url": "https://docs.databricks.com/sql/language-manual/functions/try_sum.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `try_sum` aggregate function\n######## Related\n\n* [aggregate function](https://docs.databricks.com/sql/language-manual/functions/aggregate.html)\n* [avg aggregate function](https://docs.databricks.com/sql/language-manual/functions/avg.html)\n* [max aggregate function](https://docs.databricks.com/sql/language-manual/functions/max.html)\n* [mean aggregate function](https://docs.databricks.com/sql/language-manual/functions/mean.html)\n* [min aggregate function](https://docs.databricks.com/sql/language-manual/functions/min.html)\n* [sum aggregate function](https://docs.databricks.com/sql/language-manual/functions/sum.html)\n* [Window functions](https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html)\n\n", "chunk_id": "d862eb8eda70867efdd5de74c373bb5a", "url": "https://docs.databricks.com/sql/language-manual/functions/try_sum.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n\nNote \nDatabricks Connect recommends that you use [Databricks Connect for Databricks Runtime 13.0 and above](https://docs.databricks.com/dev-tools/databricks-connect/index.html) instead. \nDatabricks plans no new feature work for Databricks Connect for Databricks Runtime 12.2 LTS and below. \nDatabricks Connect allows you to connect popular IDEs such as Visual Studio Code and PyCharm, notebook servers, and other custom applications to Databricks clusters. \nThis article explains how Databricks Connect works, walks you through the steps to get started with Databricks Connect, explains how to troubleshoot issues that may arise when using Databricks Connect, and differences between running using Databricks Connect versus running in a Databricks notebook.\n\n", "chunk_id": "9267d7d935c56814fc07fcba2a1e04ff", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Overview\n\nDatabricks Connect is a client library for the Databricks Runtime. It allows you to write jobs using Spark APIs and run them remotely on a Databricks cluster instead of in the local Spark session. \nFor example, when you run the DataFrame command `spark.read.format(...).load(...).groupBy(...).agg(...).show()` using Databricks Connect, the logical representation of the command is sent to the Spark server running in Databricks for execution on the remote cluster. \nWith Databricks Connect, you can: \n* Run large-scale Spark jobs from any Python, R, Scala, or Java application. Anywhere you can `import pyspark`, `require(SparkR)` or `import org.apache.spark`, you can now run Spark jobs directly from your application, without needing to install any IDE plugins or use Spark submission scripts.\n* Step through and debug code in your IDE even when working with a remote cluster.\n* Iterate quickly when developing libraries. You do not need to restart the cluster after changing Python or Java library dependencies in Databricks Connect, because each client session is isolated from each other in the cluster.\n* Shut down idle clusters without losing work. Because the client application is decoupled from the cluster, it is unaffected by cluster restarts or upgrades, which would normally cause you to lose all the variables, RDDs, and DataFrame objects defined in a notebook. \nNote \nFor Python development with SQL queries, Databricks recommends that you use the [Databricks SQL Connector for Python](https://docs.databricks.com/dev-tools/python-sql-connector.html) instead of Databricks Connect. the Databricks SQL Connector for Python is easier to set up than Databricks Connect. Also, Databricks Connect parses and plans jobs runs on your local machine, while jobs run on remote compute resources. This can make it especially difficult to debug runtime errors. The Databricks SQL Connector for Python submits SQL queries directly to remote compute resources and fetches results.\n\n", "chunk_id": "e48eb046924497a2860677f7c7f7793d", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Requirements\n\nThis section lists the requirements for Databricks Connect. \n* Only the following Databricks Runtime versions are supported: \n+ Databricks Runtime 12.2 LTS ML, Databricks Runtime 12.2 LTS\n+ Databricks Runtime 11.3 LTS ML, Databricks Runtime 11.3 LTS\n+ Databricks Runtime 10.4 LTS ML, Databricks Runtime 10.4 LTS\n+ Databricks Runtime 9.1 LTS ML, Databricks Runtime 9.1 LTS\n+ Databricks Runtime 7.3 LTS\n* You must install Python 3 on your development machine, and the minor version of your client Python installation must be the same as the minor Python version of your Databricks cluster. The following table shows the Python version installed with each Databricks Runtime. \n| Databricks Runtime version | Python version |\n| --- | --- |\n| 12.2 LTS ML, 12.2 LTS | 3.9 |\n| 11.3 LTS ML, 11.3 LTS | 3.9 |\n| 10.4 LTS ML, 10.4 LTS | 3.8 |\n| 9.1 LTS ML, 9.1 LTS | 3.8 |\n| 7.3 LTS | 3.7 | \nDatabricks strongly recommends that you have a Python *virtual environment* activated for each Python version that you use with Databricks Connect. Python virtual environments help to make sure that you are using the correct versions of Python and Databricks Connect together. This can help to reduce the time spent resolving related technical issues. \nFor example, if you\u2019re using [venv](https://docs.python.org/3/library/venv.html) on your development machine and your cluster is running Python 3.9, you must create a `venv` environment with that version. The following example command generates the scripts to activate a `venv` environment with Python 3.9, and this command then places those scripts within a hidden folder named `.venv` within the current working directory: \n```\n# Linux and macOS\npython3.9 -m venv ./.venv\n\n# Windows\npython3.9 -m venv .\\.venv\n\n``` \nTo use these scripts to activate this `venv` environment, see [How venvs work](https://docs.python.org/3/library/venv.html#how-venvs-work). \nAs another example, if you\u2019re using [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) on your development machine and your cluster is running Python 3.9, you must create a Conda environment with that version, for example: \n```\nconda create --name dbconnect python=3.9\n\n``` \nTo activate the Conda environment with this environment name, run `conda activate dbconnect`.\n* The Databricks Connect major and minor package version must always match your Databricks Runtime version. Databricks recommends that you always use the most recent package of Databricks Connect that matches your Databricks Runtime version. For example, when you use a Databricks Runtime 12.2 LTS cluster, you must also use the `databricks-connect==12.2.*` package. \nNote \nSee the [Databricks Connect release notes](https://docs.databricks.com/release-notes/dbconnect/index.html) for a list of available Databricks Connect releases and maintenance updates.\n* Java Runtime Environment (JRE) 8. The client has been tested with the OpenJDK 8 JRE. The client does not support Java 11. \nNote \nOn Windows, if you see an error that Databricks Connect cannot find `winutils.exe`, see [Cannot find winutils.exe on Windows](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#cannot-find-winutilsexe-on-windows-legacy).\n\n", "chunk_id": "812e000557bec601f1c460cb0eb5b303", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Set up the client\n\nComplete the following steps to set up the local client for Databricks Connect. \nNote \nBefore you begin to set up the local Databricks Connect client, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) for Databricks Connect. \n### Step 1: Install the Databricks Connect client \n1. With your virtual environment activated, uninstall PySpark, if it is already installed, by running the `uninstall` command. This is required because the `databricks-connect` package conflicts with PySpark. For details, see [Conflicting PySpark installations](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#conflicting-pyspark-installations-legacy). To check whether PySpark is already installed, run the `show` command. \n```\n# Is PySpark already installed?\npip3 show pyspark\n\n# Uninstall PySpark\npip3 uninstall pyspark\n\n```\n2. With your virtual environment still activated, install the Databricks Connect client by running the `install` command. Use the `--upgrade` option to upgrade any existing client installation to the specified version. \n```\npip3 install --upgrade \"databricks-connect==12.2.*\" # Or X.Y.* to match your cluster version.\n\n``` \nNote \nDatabricks recommends that you append the \u201cdot-asterisk\u201d notation to specify `databricks-connect==X.Y.*` instead of `databricks-connect=X.Y`, to make sure that the most recent package is installed. \n### Step 2: Configure connection properties \n1. Collect the following configuration properties. \n* The Databricks [workspace URL](https://docs.databricks.com/workspace/workspace-details.html#workspace-url).\n* Your Databricks [personal access token](https://docs.databricks.com/dev-tools/auth/pat.html).\n* The ID of your cluster. You can obtain the cluster ID from the URL. Here the cluster ID is `0304-201045-hoary804`. \n![Cluster ID 2](https://docs.databricks.com/_images/cluster-id-aws.png)\n* The port that Databricks Connect connects to on your cluster. The default port is `15001`.\n2. Configure the connection as follows. \nYou can use the CLI, SQL configs, or environment variables. The precedence of configuration methods from highest to lowest is: SQL config keys, CLI, and environment variables. \n* CLI \n1. Run `databricks-connect`. \n```\ndatabricks-connect configure\n\n``` \nThe license displays: \n```\nCopyright (2018) Databricks, Inc.\n\nThis library (the \"Software\") may not be used except in connection with the\nLicensee's use of the Databricks Platform Services pursuant to an Agreement\n...\n\n```\n2. Accept the license and supply configuration values. For **Databricks Host** and **Databricks Token**, enter the workspace URL and the personal access token you noted in Step 1. \n```\nDo you accept the above agreement? [y/N] y\nSet new config values (leave input empty to accept default):\nDatabricks Host [no current value, must start with https://]: \nDatabricks Token [no current value]: \nCluster ID (e.g., 0921-001415-jelly628) [no current value]: \nOrg ID (Azure-only, see ?o=orgId in URL) [0]: \nPort [15001]: \n\n```\n* SQL configs or environment variables. The following table shows the SQL config keys and the environment variables that correspond to the configuration properties you noted in Step 1. To set a SQL config key, use `sql(\"set config=value\")`. For example: `sql(\"set spark.databricks.service.clusterId=0304-201045-abcdefgh\")`. \n| Parameter | SQL config key | Environment variable name |\n| --- | --- | --- |\n| Databricks Host | spark.databricks.service.address | DATABRICKS\\_ADDRESS |\n| Databricks Token | spark.databricks.service.token | DATABRICKS\\_API\\_TOKEN |\n| Cluster ID | spark.databricks.service.clusterId | DATABRICKS\\_CLUSTER\\_ID |\n| Org ID | spark.databricks.service.orgId | DATABRICKS\\_ORG\\_ID |\n| Port | spark.databricks.service.port | DATABRICKS\\_PORT |\n3. With your virtual environment still activated, test connectivity to Databricks as follows. \n```\ndatabricks-connect test\n\n``` \nIf the cluster you configured is not running, the test starts the cluster which will remain running until its configured autotermination time. The output should look similar to the following: \n```\n* PySpark is installed at /.../.../pyspark\n* Checking java version\njava version \"1.8...\"\nJava(TM) SE Runtime Environment (build 1.8...)\nJava HotSpot(TM) 64-Bit Server VM (build 25..., mixed mode)\n* Testing scala command\n../../.. ..:..:.. WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\nUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties\nSetting default log level to \"WARN\".\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n../../.. ..:..:.. WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.\n../../.. ..:..:.. WARN SparkServiceRPCClient: Now tracking server state for 5ab..., invalidating prev state\n../../.. ..:..:.. WARN SparkServiceRPCClient: Syncing 129 files (176036 bytes) took 3003 ms\nWelcome to\n____ __\n/ __/__ ___ _____/ /__\n_\\ \\/ _ \\/ _ `/ __/ '_/\n/___/ .__/\\_,_/_/ /_/\\_\\ version 2...\n/_/\n\nUsing Scala version 2.... (Java HotSpot(TM) 64-Bit Server VM, Java 1.8...)\nType in expressions to have them evaluated.\nType :help for more information.\n\nscala> spark.range(100).reduce(_ + _)\nSpark context Web UI available at https://...\nSpark context available as 'sc' (master = local[*], app id = local-...).\nSpark session available as 'spark'.\nView job details at /?o=0#/setting/clusters//sparkUi\nView job details at ?o=0#/setting/clusters//sparkUi\nres0: Long = 4950\n\nscala> :quit\n\n* Testing python command\n../../.. ..:..:.. WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\nUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties\nSetting default log level to \"WARN\".\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n../../.. ..:..:.. WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.\n../../.. ..:..:.. WARN SparkServiceRPCClient: Now tracking server state for 5ab.., invalidating prev state\nView job details at /?o=0#/setting/clusters//sparkUi\n\n```\n4. If no connection-related errors are shown (`WARN` messages are okay), then you have successfully connected.\n\n", "chunk_id": "c2b8e9a46a3178e8f212dac803398a17", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Use Databricks Connect\n\nThe section describes how to configure your preferred IDE or notebook server to use the client for Databricks Connect. \nIn this section: \n* [JupyterLab](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#jupyterlab)\n* [Classic Jupyter Notebook](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#classic-jupyter-notebook)\n* [PyCharm](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#pycharm)\n* [SparkR and RStudio Desktop](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#sparkr-and-rstudio-desktop)\n* [sparklyr and RStudio Desktop](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#sparklyr-and-rstudio-desktop)\n* [IntelliJ (Scala or Java)](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#intellij-scala-or-java)\n* [PyDev with Eclipse](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#pydev-with-eclipse)\n* [Eclipse](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#eclipse)\n* [SBT](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#sbt)\n* [Spark shell](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#spark-shell) \n### [JupyterLab](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id4) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nTo use Databricks Connect with [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) and Python, follow these instructions. \n1. To install JupyterLab, with your Python virtual environment activated, run the following command from your terminal or Command Prompt: \n```\npip3 install jupyterlab\n\n```\n2. To start JupyterLab in your web browser, run the following command from your activated Python virtual environment: \n```\njupyter lab\n\n``` \nIf JupyterLab does not appear in your web browser, copy the URL that starts with `localhost` or `127.0.0.1` from your virtual environment, and enter it in your web browser\u2019s address bar.\n3. Create a new notebook: in JupyterLab, click **File > New > Notebook** on the main menu, select **Python 3 (ipykernel)** and click **Select**.\n4. In the notebook\u2019s first cell, enter either the [example code](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#example-code-legacy) or your own code. If you use your own code, at minimum you must instantiate an instance of `SparkSession.builder.getOrCreate()`, as shown in the [example code](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#example-code-legacy).\n5. To run the notebook, click **Run > Run All Cells**.\n6. To debug the notebook, click the bug (**Enable Debugger**) icon next to **Python 3 (ipykernel)** in the notebook\u2019s toolbar. Set one or more breakpoints, and then click **Run > Run All Cells**.\n7. To shut down JupyterLab, click **File > Shut Down**. If the JupyterLab process is still running in your terminal or Command Prompt, stop this process by pressing `Ctrl + c` and then entering `y` to confirm. \nFor more specific debug instructions, see [Debugger](https://jupyterlab.readthedocs.io/en/stable/user/debugger.html). \n### [Classic Jupyter Notebook](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id5) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nThe configuration script for Databricks Connect automatically adds the package to your project configuration. To get started in a Python kernel, run: \n```\nfrom pyspark.sql import SparkSession\nspark = SparkSession.builder.getOrCreate()\n\n``` \nTo enable the `%sql` shorthand for running and visualizing SQL queries, use the following snippet: \n```\nfrom IPython.core.magic import line_magic, line_cell_magic, Magics, magics_class\n\n@magics_class\nclass DatabricksConnectMagics(Magics):\n\n@line_cell_magic\ndef sql(self, line, cell=None):\nif cell and line:\nraise ValueError(\"Line must be empty for cell magic\", line)\ntry:\nfrom autovizwidget.widget.utils import display_dataframe\nexcept ImportError:\nprint(\"Please run `pip install autovizwidget` to enable the visualization widget.\")\ndisplay_dataframe = lambda x: x\nreturn display_dataframe(self.get_spark().sql(cell or line).toPandas())\n\ndef get_spark(self):\nuser_ns = get_ipython().user_ns\nif \"spark\" in user_ns:\nreturn user_ns[\"spark\"]\nelse:\nfrom pyspark.sql import SparkSession\nuser_ns[\"spark\"] = SparkSession.builder.getOrCreate()\nreturn user_ns[\"spark\"]\n\nip = get_ipython()\nip.register_magics(DatabricksConnectMagics)\n\n``` \n#### Visual Studio Code \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nTo use Databricks Connect with Visual Studio Code, do the following: \n1. Verify that the [Python extension](https://marketplace.visualstudio.com/items?itemName=ms-python.python) is installed.\n2. Open the Command Palette (**Command+Shift+P** on macOS and **Ctrl+Shift+P** on Windows/Linux).\n3. Select a Python interpreter. Go to **Code > Preferences > Settings**, and choose **python settings**.\n4. Run `databricks-connect get-jar-dir`.\n5. Add the directory returned from the command to the User Settings JSON under `python.venvPath`. This should be added to the Python Configuration.\n6. Disable the linter. Click the **\u2026** on the right side and **edit json settings**. The modified settings are as follows: \n![VS Code configuration](https://docs.databricks.com/_images/vscode.png)\n7. If running with a virtual environment, which is the recommended way to develop for Python in VS Code, in the Command Palette type `select python interpreter` and point to your environment that *matches* your cluster Python version. \n![Select Python interpreter](https://docs.databricks.com/_images/select-intepreter.png) \nFor example, if your cluster is Python 3.9, your development environment should be Python 3.9. \n![Python version](https://docs.databricks.com/_images/python35.png) \n### [PyCharm](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id6) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nThe configuration script for Databricks Connect automatically adds the package to your project configuration. \n#### Python 3 clusters \n1. When you create a PyCharm project, select **Existing Interpreter**. From the drop-down menu, select the Conda environment you created (see [Requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements)). \n![Select interpreter](https://docs.databricks.com/_images/interpreter.png)\n2. Go to **Run > Edit Configurations**.\n3. Add `PYSPARK_PYTHON=python3` as an environment variable. \n![Python 3 cluster configuration](https://docs.databricks.com/_images/python3-env.png) \n### [SparkR and RStudio Desktop](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id7) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nTo use Databricks Connect with SparkR and RStudio Desktop, do the following: \n1. Download and unpack the [open source Spark](https://spark.apache.org/downloads.html) distribution onto your development machine. Choose the same version as in your Databricks cluster (Hadoop 2.7).\n2. Run `databricks-connect get-jar-dir`. This command returns a path like `/usr/local/lib/python3.5/dist-packages/pyspark/jars`. Copy the file path of *one directory above* the JAR directory file path, for example, `/usr/local/lib/python3.5/dist-packages/pyspark`, which is the `SPARK_HOME` directory.\n3. Configure the Spark lib path and Spark home by adding them to the top of your R script. Set `` to the directory where you unpacked the open source Spark package in step 1. Set `` to the Databricks Connect directory from step 2. \n```\n# Point to the OSS package path, e.g., /path/to/.../spark-2.4.0-bin-hadoop2.7\nlibrary(SparkR, lib.loc = .libPaths(c(file.path('', 'R', 'lib'), .libPaths())))\n\n# Point to the Databricks Connect PySpark installation, e.g., /path/to/.../pyspark\nSys.setenv(SPARK_HOME = \"\")\n\n```\n4. Initiate a Spark session and start running SparkR commands. \n```\nsparkR.session()\n\ndf <- as.DataFrame(faithful)\nhead(df)\n\ndf1 <- dapply(df, function(x) { x }, schema(df))\ncollect(df1)\n\n``` \n### [sparklyr and RStudio Desktop](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id8) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nPreview \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \nYou can copy sparklyr-dependent code that you\u2019ve developed locally using Databricks Connect and run it in a Databricks notebook or hosted RStudio Server in your Databricks workspace with minimal or no code changes. \nIn this section: \n* [Requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements)\n* [Install, configure, and use sparklyr](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#install-configure-and-use-sparklyr)\n* [Resources](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#resources)\n* [sparklyr and RStudio Desktop limitations](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#sparklyr-and-rstudio-desktop-limitations) \n#### [Requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id14) \n* sparklyr 1.2 or above.\n* Databricks Runtime 7.3 LTS or above with the matching version of Databricks Connect. \n#### [Install, configure, and use sparklyr](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id15) \n1. In RStudio Desktop, install sparklyr 1.2 or above from CRAN or install the latest master version from GitHub. \n```\n# Install from CRAN\ninstall.packages(\"sparklyr\")\n\n# Or install the latest master version from GitHub\ninstall.packages(\"devtools\")\ndevtools::install_github(\"sparklyr/sparklyr\")\n\n```\n2. Activate the Python environment with the correct version of Databricks Connect installed and run the following command in the terminal to get the ``: \n```\ndatabricks-connect get-spark-home\n\n```\n3. Initiate a Spark session and start running sparklyr commands. \n```\nlibrary(sparklyr)\nsc <- spark_connect(method = \"databricks\", spark_home = \"\")\n\niris_tbl <- copy_to(sc, iris, overwrite = TRUE)\n\nlibrary(dplyr)\nsrc_tbls(sc)\n\niris_tbl %>% count\n\n```\n4. Close the connection. \n```\nspark_disconnect(sc)\n\n``` \n#### [Resources](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id16) \nFor more information, see the sparklyr GitHub [README](https://github.com/sparklyr/sparklyr#connecting-through-databricks-connect). \nFor code examples, see [sparklyr](https://docs.databricks.com/sparkr/sparklyr.html). \n#### [sparklyr and RStudio Desktop limitations](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id17) \nThe following features are unsupported: \n* sparklyr streaming APIs\n* sparklyr ML APIs\n* broom APIs\n* csv\\_file serialization mode\n* spark submit \n### [IntelliJ (Scala or Java)](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id9) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nTo use Databricks Connect with IntelliJ (Scala or Java), do the following: \n1. Run `databricks-connect get-jar-dir`.\n2. Point the dependencies to the directory returned from the command. Go to **File > Project Structure > Modules > Dependencies > \u2018+\u2019 sign > JARs or Directories**. \n![IntelliJ JARs](https://docs.databricks.com/_images/intelli-j-jars.png) \nTo avoid conflicts, we strongly recommend removing any other Spark installations from your classpath. If this is not possible, make sure that the JARs you add are at the front of the classpath. In particular, they must be ahead of any other installed version of Spark (otherwise you will either use one of those other Spark versions and run locally or throw a `ClassDefNotFoundError`).\n3. Check the setting of the breakout option in IntelliJ. The default is **All** and will cause network timeouts if you set breakpoints for debugging. Set it to **Thread** to avoid stopping the background network threads. \n![IntelliJ Thread](https://docs.databricks.com/_images/intelli-j-thread.png) \n### [PyDev with Eclipse](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id10) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nTo use Databricks Connect and [PyDev](https://www.pydev.org/manual_101_install.html) with [Eclipse](https://www.eclipse.org/downloads), follow these instructions. \n1. Start Eclipse.\n2. Create a project: click **File > New > Project > PyDev > PyDev Project**, and then click **Next**.\n3. Specify a **Project name**.\n4. For **Project contents**, specify the path to your Python virtual environment.\n5. Click **Please configure an interpreter before proceeding**.\n6. Click **Manual config**.\n7. Click **New > Browse for python/pypy exe**.\n8. Browse to and select select the full path to the Python interpreter that is referenced from the virtual environment, and then click **Open**.\n9. In the **Select interpreter** dialog, click **OK**.\n10. In the **Selection needed** dialog, click **OK**.\n11. In the **Preferences** dialog, click **Apply and Close**.\n12. In the **PyDev Project** dialog, click **Finish**.\n13. Click **Open Perspective**.\n14. Add to the project a Python code (`.py`) file that contains either the [example code](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#example-code-legacy) or your own code. If you use your own code, at minimum you must instantiate an instance of `SparkSession.builder.getOrCreate()`, as shown in the [example code](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#example-code-legacy).\n15. With the Python code file open, set any breakpoints where you want your code to pause while running.\n16. Click **Run > Run** or **Run > Debug**. \nFor more specific run and debug instructions, see [Running a Program](https://www.pydev.org/manual_101_run.html). \n### [Eclipse](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id11) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nTo use Databricks Connect and Eclipse, do the following: \n1. Run `databricks-connect get-jar-dir`.\n2. Point the external JARs configuration to the directory returned from the command. Go to **Project menu > Properties > Java Build Path > Libraries > Add External Jars**. \n![Eclipse external JAR configuration](https://docs.databricks.com/_images/eclipse.png) \nTo avoid conflicts, we strongly recommend removing any other Spark installations from your classpath. If this is not possible, make sure that the JARs you add are at the front of the classpath. In particular, they must be ahead of any other installed version of Spark (otherwise you will either use one of those other Spark versions and run locally or throw a `ClassDefNotFoundError`). \n![Eclipse Spark configuration](https://docs.databricks.com/_images/eclipse2.png) \n### [SBT](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id12) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nTo use Databricks Connect with SBT, you must configure your `build.sbt` file to link against the Databricks Connect JARs instead of the usual Spark library dependency. You do this with the `unmanagedBase` directive in the following example build file, which assumes a Scala app that has a `com.example.Test` main object: \n#### `build.sbt` \n```\nname := \"hello-world\"\nversion := \"1.0\"\nscalaVersion := \"2.11.6\"\n// this should be set to the path returned by ``databricks-connect get-jar-dir``\nunmanagedBase := new java.io.File(\"/usr/local/lib/python2.7/dist-packages/pyspark/jars\")\nmainClass := Some(\"com.example.Test\")\n\n``` \n### [Spark shell](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id13) \nNote \nBefore you begin to use Databricks Connect, you must [meet the requirements](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#requirements-legacy) and [set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy) for Databricks Connect. \nTo use Databricks Connect with the Spark shell and Python or Scala, follow these instructions. \n1. With your virtual environment activated, make sure that the `databricks-connect test` command ran successfully in [Set up the client](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#set-up-legacy).\n2. With your virtual environment activated, start the Spark shell. For Python, run the `pyspark` command. For Scala, run the `spark-shell` command. \n```\n# For Python:\npyspark\n\n``` \n```\n# For Scala:\nspark-shell\n\n```\n3. The Spark shell appears, for example for Python: \n```\nPython 3... (v3...)\n[Clang 6... (clang-6...)] on darwin\nType \"help\", \"copyright\", \"credits\" or \"license\" for more information.\nSetting default log level to \"WARN\".\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n../../.. ..:..:.. WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\nWelcome to\n____ __\n/ __/__ ___ _____/ /__\n_\\ \\/ _ \\/ _ `/ __/ '_/\n/__ / .__/\\_,_/_/ /_/\\_\\ version 3....\n/_/\n\nUsing Python version 3... (v3...)\nSpark context Web UI available at http://...:...\nSpark context available as 'sc' (master = local[*], app id = local-...).\nSparkSession available as 'spark'.\n>>>\n\n``` \nFor Scala: \n```\nSetting default log level to \"WARN\".\nTo adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).\n../../.. ..:..:.. WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable\nSpark context Web UI available at http://...\nSpark context available as 'sc' (master = local[*], app id = local-...).\nSpark session available as 'spark'.\nWelcome to\n____ __\n/ __/__ ___ _____/ /__\n_\\ \\/ _ \\/ _ `/ __/ '_/\n/___/ .__/\\_,_/_/ /_/\\_\\ version 3...\n/_/\n\nUsing Scala version 2... (OpenJDK 64-Bit Server VM, Java 1.8...)\nType in expressions to have them evaluated.\nType :help for more information.\n\nscala>\n\n```\n4. Refer to [Interactive Analysis with the Spark Shell](https://spark.apache.org/docs/latest/quick-start.html#interactive-analysis-with-the-spark-shell) for information about how to use the Spark shell with Python or Scala to run commands on your cluster. \nUse the built-in `spark` variable to represent the `SparkSession` on your running cluster, for example for Python: \n```\n>>> df = spark.read.table(\"samples.nyctaxi.trips\")\n>>> df.show(5)\n+--------------------+---------------------+-------------+-----------+----------+-----------+\n|tpep_pickup_datetime|tpep_dropoff_datetime|trip_distance|fare_amount|pickup_zip|dropoff_zip|\n+--------------------+---------------------+-------------+-----------+----------+-----------+\n| 2016-02-14 16:52:13| 2016-02-14 17:16:04| 4.94| 19.0| 10282| 10171|\n| 2016-02-04 18:44:19| 2016-02-04 18:46:00| 0.28| 3.5| 10110| 10110|\n| 2016-02-17 17:13:57| 2016-02-17 17:17:55| 0.7| 5.0| 10103| 10023|\n| 2016-02-18 10:36:07| 2016-02-18 10:41:45| 0.8| 6.0| 10022| 10017|\n| 2016-02-22 14:14:41| 2016-02-22 14:31:52| 4.51| 17.0| 10110| 10282|\n+--------------------+---------------------+-------------+-----------+----------+-----------+\nonly showing top 5 rows\n\n``` \nFor Scala: \n```\n>>> val df = spark.read.table(\"samples.nyctaxi.trips\")\n>>> df.show(5)\n+--------------------+---------------------+-------------+-----------+----------+-----------+\n|tpep_pickup_datetime|tpep_dropoff_datetime|trip_distance|fare_amount|pickup_zip|dropoff_zip|\n+--------------------+---------------------+-------------+-----------+----------+-----------+\n| 2016-02-14 16:52:13| 2016-02-14 17:16:04| 4.94| 19.0| 10282| 10171|\n| 2016-02-04 18:44:19| 2016-02-04 18:46:00| 0.28| 3.5| 10110| 10110|\n| 2016-02-17 17:13:57| 2016-02-17 17:17:55| 0.7| 5.0| 10103| 10023|\n| 2016-02-18 10:36:07| 2016-02-18 10:41:45| 0.8| 6.0| 10022| 10017|\n| 2016-02-22 14:14:41| 2016-02-22 14:31:52| 4.51| 17.0| 10110| 10282|\n+--------------------+---------------------+-------------+-----------+----------+-----------+\nonly showing top 5 rows\n\n```\n5. To stop the Spark shell, press `Ctrl + d` or `Ctrl + z`, or run the command `quit()` or `exit()` for Python or `:q` or `:quit` for Scala.\n\n", "chunk_id": "c2f7581c547128800fd79a4059711f7f", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Code examples\n\nThis simple code example queries the specified table and then shows the specified table\u2019s first 5 rows. To use a different table, adjust the call to `spark.read.table`. \n```\nfrom pyspark.sql.session import SparkSession\n\nspark = SparkSession.builder.getOrCreate()\n\ndf = spark.read.table(\"samples.nyctaxi.trips\")\ndf.show(5)\n\n``` \nThis longer code example does the following: \n1. Creates an in-memory DataFrame.\n2. Creates a table with the name `zzz_demo_temps_table` within the `default` schema. If the table with this name already exists, the table is deleted first. To use a different schema or table, adjust the calls to `spark.sql`, `temps.write.saveAsTable`, or both.\n3. Saves the DataFrame\u2019s contents to the table.\n4. Runs a `SELECT` query on the table\u2019s contents.\n5. Shows the query\u2019s result.\n6. Deletes the table. \n```\nfrom pyspark.sql import SparkSession\nfrom pyspark.sql.types import *\nfrom datetime import date\n\nspark = SparkSession.builder.appName('temps-demo').getOrCreate()\n\n# Create a Spark DataFrame consisting of high and low temperatures\n# by airport code and date.\nschema = StructType([\nStructField('AirportCode', StringType(), False),\nStructField('Date', DateType(), False),\nStructField('TempHighF', IntegerType(), False),\nStructField('TempLowF', IntegerType(), False)\n])\n\ndata = [\n[ 'BLI', date(2021, 4, 3), 52, 43],\n[ 'BLI', date(2021, 4, 2), 50, 38],\n[ 'BLI', date(2021, 4, 1), 52, 41],\n[ 'PDX', date(2021, 4, 3), 64, 45],\n[ 'PDX', date(2021, 4, 2), 61, 41],\n[ 'PDX', date(2021, 4, 1), 66, 39],\n[ 'SEA', date(2021, 4, 3), 57, 43],\n[ 'SEA', date(2021, 4, 2), 54, 39],\n[ 'SEA', date(2021, 4, 1), 56, 41]\n]\n\ntemps = spark.createDataFrame(data, schema)\n\n# Create a table on the Databricks cluster and then fill\n# the table with the DataFrame's contents.\n# If the table already exists from a previous run,\n# delete it first.\nspark.sql('USE default')\nspark.sql('DROP TABLE IF EXISTS zzz_demo_temps_table')\ntemps.write.saveAsTable('zzz_demo_temps_table')\n\n# Query the table on the Databricks cluster, returning rows\n# where the airport code is not BLI and the date is later\n# than 2021-04-01. Group the results and order by high\n# temperature in descending order.\ndf_temps = spark.sql(\"SELECT * FROM zzz_demo_temps_table \" \\\n\"WHERE AirportCode != 'BLI' AND Date > '2021-04-01' \" \\\n\"GROUP BY AirportCode, Date, TempHighF, TempLowF \" \\\n\"ORDER BY TempHighF DESC\")\ndf_temps.show()\n\n# Results:\n#\n# +-----------+----------+---------+--------+\n# |AirportCode| Date|TempHighF|TempLowF|\n# +-----------+----------+---------+--------+\n# | PDX|2021-04-03| 64| 45|\n# | PDX|2021-04-02| 61| 41|\n# | SEA|2021-04-03| 57| 43|\n# | SEA|2021-04-02| 54| 39|\n# +-----------+----------+---------+--------+\n\n# Clean up by deleting the table from the Databricks cluster.\nspark.sql('DROP TABLE zzz_demo_temps_table')\n\n``` \n```\nimport org.apache.spark.sql.SparkSession\nimport org.apache.spark.sql.types._\nimport org.apache.spark.sql.Row\nimport java.sql.Date\n\nobject Demo {\ndef main(args: Array[String]) {\nval spark = SparkSession.builder.master(\"local\").getOrCreate()\n\n// Create a Spark DataFrame consisting of high and low temperatures\n// by airport code and date.\nval schema = StructType(Array(\nStructField(\"AirportCode\", StringType, false),\nStructField(\"Date\", DateType, false),\nStructField(\"TempHighF\", IntegerType, false),\nStructField(\"TempLowF\", IntegerType, false)\n))\n\nval data = List(\nRow(\"BLI\", Date.valueOf(\"2021-04-03\"), 52, 43),\nRow(\"BLI\", Date.valueOf(\"2021-04-02\"), 50, 38),\nRow(\"BLI\", Date.valueOf(\"2021-04-01\"), 52, 41),\nRow(\"PDX\", Date.valueOf(\"2021-04-03\"), 64, 45),\nRow(\"PDX\", Date.valueOf(\"2021-04-02\"), 61, 41),\nRow(\"PDX\", Date.valueOf(\"2021-04-01\"), 66, 39),\nRow(\"SEA\", Date.valueOf(\"2021-04-03\"), 57, 43),\nRow(\"SEA\", Date.valueOf(\"2021-04-02\"), 54, 39),\nRow(\"SEA\", Date.valueOf(\"2021-04-01\"), 56, 41)\n)\n\nval rdd = spark.sparkContext.makeRDD(data)\nval temps = spark.createDataFrame(rdd, schema)\n\n// Create a table on the Databricks cluster and then fill\n// the table with the DataFrame's contents.\n// If the table already exists from a previous run,\n// delete it first.\nspark.sql(\"USE default\")\nspark.sql(\"DROP TABLE IF EXISTS zzz_demo_temps_table\")\ntemps.write.saveAsTable(\"zzz_demo_temps_table\")\n\n// Query the table on the Databricks cluster, returning rows\n// where the airport code is not BLI and the date is later\n// than 2021-04-01. Group the results and order by high\n// temperature in descending order.\nval df_temps = spark.sql(\"SELECT * FROM zzz_demo_temps_table \" +\n\"WHERE AirportCode != 'BLI' AND Date > '2021-04-01' \" +\n\"GROUP BY AirportCode, Date, TempHighF, TempLowF \" +\n\"ORDER BY TempHighF DESC\")\ndf_temps.show()\n\n// Results:\n//\n// +-----------+----------+---------+--------+\n// |AirportCode| Date|TempHighF|TempLowF|\n// +-----------+----------+---------+--------+\n// | PDX|2021-04-03| 64| 45|\n// | PDX|2021-04-02| 61| 41|\n// | SEA|2021-04-03| 57| 43|\n// | SEA|2021-04-02| 54| 39|\n// +-----------+----------+---------+--------+\n\n// Clean up by deleting the table from the Databricks cluster.\nspark.sql(\"DROP TABLE zzz_demo_temps_table\")\n}\n}\n\n``` \n```\nimport java.util.ArrayList;\nimport java.util.List;\nimport java.sql.Date;\nimport org.apache.spark.sql.SparkSession;\nimport org.apache.spark.sql.types.*;\nimport org.apache.spark.sql.Row;\nimport org.apache.spark.sql.RowFactory;\nimport org.apache.spark.sql.Dataset;\n\npublic class App {\npublic static void main(String[] args) throws Exception {\nSparkSession spark = SparkSession\n.builder()\n.appName(\"Temps Demo\")\n.config(\"spark.master\", \"local\")\n.getOrCreate();\n\n// Create a Spark DataFrame consisting of high and low temperatures\n// by airport code and date.\nStructType schema = new StructType(new StructField[] {\nnew StructField(\"AirportCode\", DataTypes.StringType, false, Metadata.empty()),\nnew StructField(\"Date\", DataTypes.DateType, false, Metadata.empty()),\nnew StructField(\"TempHighF\", DataTypes.IntegerType, false, Metadata.empty()),\nnew StructField(\"TempLowF\", DataTypes.IntegerType, false, Metadata.empty()),\n});\n\nList dataList = new ArrayList();\ndataList.add(RowFactory.create(\"BLI\", Date.valueOf(\"2021-04-03\"), 52, 43));\ndataList.add(RowFactory.create(\"BLI\", Date.valueOf(\"2021-04-02\"), 50, 38));\ndataList.add(RowFactory.create(\"BLI\", Date.valueOf(\"2021-04-01\"), 52, 41));\ndataList.add(RowFactory.create(\"PDX\", Date.valueOf(\"2021-04-03\"), 64, 45));\ndataList.add(RowFactory.create(\"PDX\", Date.valueOf(\"2021-04-02\"), 61, 41));\ndataList.add(RowFactory.create(\"PDX\", Date.valueOf(\"2021-04-01\"), 66, 39));\ndataList.add(RowFactory.create(\"SEA\", Date.valueOf(\"2021-04-03\"), 57, 43));\ndataList.add(RowFactory.create(\"SEA\", Date.valueOf(\"2021-04-02\"), 54, 39));\ndataList.add(RowFactory.create(\"SEA\", Date.valueOf(\"2021-04-01\"), 56, 41));\n\nDataset temps = spark.createDataFrame(dataList, schema);\n\n// Create a table on the Databricks cluster and then fill\n// the table with the DataFrame's contents.\n// If the table already exists from a previous run,\n// delete it first.\nspark.sql(\"USE default\");\nspark.sql(\"DROP TABLE IF EXISTS zzz_demo_temps_table\");\ntemps.write().saveAsTable(\"zzz_demo_temps_table\");\n\n// Query the table on the Databricks cluster, returning rows\n// where the airport code is not BLI and the date is later\n// than 2021-04-01. Group the results and order by high\n// temperature in descending order.\nDataset df_temps = spark.sql(\"SELECT * FROM zzz_demo_temps_table \" +\n\"WHERE AirportCode != 'BLI' AND Date > '2021-04-01' \" +\n\"GROUP BY AirportCode, Date, TempHighF, TempLowF \" +\n\"ORDER BY TempHighF DESC\");\ndf_temps.show();\n\n// Results:\n//\n// +-----------+----------+---------+--------+\n// |AirportCode| Date|TempHighF|TempLowF|\n// +-----------+----------+---------+--------+\n// | PDX|2021-04-03| 64| 45|\n// | PDX|2021-04-02| 61| 41|\n// | SEA|2021-04-03| 57| 43|\n// | SEA|2021-04-02| 54| 39|\n// +-----------+----------+---------+--------+\n\n// Clean up by deleting the table from the Databricks cluster.\nspark.sql(\"DROP TABLE zzz_demo_temps_table\");\n}\n}\n\n```\n\n", "chunk_id": "0d9b4398881eb6a38415bf610f3b3fee", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Work with dependencies\n\nTypically your main class or Python file will have other dependency JARs and files. You can add such dependency JARs and files by calling `sparkContext.addJar(\"path-to-the-jar\")` or `sparkContext.addPyFile(\"path-to-the-file\")`. You can also add Egg files and zip files with the `addPyFile()` interface. Every time you run the code in your IDE, the dependency JARs and files are installed on the cluster. \n```\nfrom lib import Foo\nfrom pyspark.sql import SparkSession\n\nspark = SparkSession.builder.getOrCreate()\n\nsc = spark.sparkContext\n#sc.setLogLevel(\"INFO\")\n\nprint(\"Testing simple count\")\nprint(spark.range(100).count())\n\nprint(\"Testing addPyFile isolation\")\nsc.addPyFile(\"lib.py\")\nprint(sc.parallelize(range(10)).map(lambda i: Foo(2)).collect())\n\nclass Foo(object):\ndef __init__(self, x):\nself.x = x\n\n``` \n**Python + Java UDFs** \n```\nfrom pyspark.sql import SparkSession\nfrom pyspark.sql.column import _to_java_column, _to_seq, Column\n\n## In this example, udf.jar contains compiled Java / Scala UDFs:\n#package com.example\n#\n#import org.apache.spark.sql._\n#import org.apache.spark.sql.expressions._\n#import org.apache.spark.sql.functions.udf\n#\n#object Test {\n# val plusOne: UserDefinedFunction = udf((i: Long) => i + 1)\n#}\n\nspark = SparkSession.builder \\\n.config(\"spark.jars\", \"/path/to/udf.jar\") \\\n.getOrCreate()\nsc = spark.sparkContext\n\ndef plus_one_udf(col):\nf = sc._jvm.com.example.Test.plusOne()\nreturn Column(f.apply(_to_seq(sc, [col], _to_java_column)))\n\nsc._jsc.addJar(\"/path/to/udf.jar\")\nspark.range(100).withColumn(\"plusOne\", plus_one_udf(\"id\")).show()\n\n``` \n```\npackage com.example\n\nimport org.apache.spark.sql.SparkSession\n\ncase class Foo(x: String)\n\nobject Test {\ndef main(args: Array[String]): Unit = {\nval spark = SparkSession.builder()\n...\n.getOrCreate();\nspark.sparkContext.setLogLevel(\"INFO\")\n\nprintln(\"Running simple show query...\")\nspark.read.format(\"parquet\").load(\"/tmp/x\").show()\n\nprintln(\"Running simple UDF query...\")\nspark.sparkContext.addJar(\"./target/scala-2.11/hello-world_2.11-1.0.jar\")\nspark.udf.register(\"f\", (x: Int) => x + 1)\nspark.range(10).selectExpr(\"f(id)\").show()\n\nprintln(\"Running custom objects query...\")\nval objs = spark.sparkContext.parallelize(Seq(Foo(\"bye\"), Foo(\"hi\"))).collect()\nprintln(objs.toSeq)\n}\n}\n\n```\n\n", "chunk_id": "7295e3d2c9d0ffd479fee343d4a150f4", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Access Databricks Utilities\n\nThis section describes how to use Databricks Connect to access [Databricks Utilities](https://docs.databricks.com/dev-tools/databricks-utils.html). \nYou can use `dbutils.fs` and `dbutils.secrets` utilities of the [Databricks Utilities (dbutils) reference](https://docs.databricks.com/dev-tools/databricks-utils.html) module.\nSupported commands are `dbutils.fs.cp`, `dbutils.fs.head`, `dbutils.fs.ls`, `dbutils.fs.mkdirs`, `dbutils.fs.mv`, `dbutils.fs.put`, `dbutils.fs.rm`, `dbutils.secrets.get`, `dbutils.secrets.getBytes`, `dbutils.secrets.list`, `dbutils.secrets.listScopes`.\nSee [File system utility (dbutils.fs)](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs) or run `dbutils.fs.help()` and [Secrets utility (dbutils.secrets)](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-secrets) or run `dbutils.secrets.help()`. \n```\nfrom pyspark.sql import SparkSession\nfrom pyspark.dbutils import DBUtils\n\nspark = SparkSession.builder.getOrCreate()\n\ndbutils = DBUtils(spark)\nprint(dbutils.fs.ls(\"dbfs:/\"))\nprint(dbutils.secrets.listScopes())\n\n``` \nWhen using Databricks Runtime 7.3 LTS or above, to access the DBUtils module in a way that works both locally and in Databricks clusters, use the following `get_dbutils()`: \n```\ndef get_dbutils(spark):\nfrom pyspark.dbutils import DBUtils\nreturn DBUtils(spark)\n\n``` \nOtherwise, use the following `get_dbutils()`: \n```\ndef get_dbutils(spark):\nif spark.conf.get(\"spark.databricks.service.client.enabled\") == \"true\":\nfrom pyspark.dbutils import DBUtils\nreturn DBUtils(spark)\nelse:\nimport IPython\nreturn IPython.get_ipython().user_ns[\"dbutils\"]\n\n``` \n```\nval dbutils = com.databricks.service.DBUtils\nprintln(dbutils.fs.ls(\"dbfs:/\"))\nprintln(dbutils.secrets.listScopes())\n\n``` \n### Copying files between local and remote filesystems \nYou can use `dbutils.fs` to copy files between your client and remote filesystems. Scheme `file:/` refers to the local filesystem on the client. \n```\nfrom pyspark.dbutils import DBUtils\ndbutils = DBUtils(spark)\n\ndbutils.fs.cp('file:/home/user/data.csv', 'dbfs:/uploads')\ndbutils.fs.cp('dbfs:/output/results.csv', 'file:/home/user/downloads/')\n\n``` \nThe maximum file size that can be transferred that way is 250 MB. \n### Enable `dbutils.secrets.get` \nBecause of security restrictions, the ability to call `dbutils.secrets.get` is disabled by default. Contact Databricks support to enable this feature for your workspace.\n\n", "chunk_id": "86484888d97b13fee44df4850b724e4c", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Set Hadoop configurations\n\nOn the client you can set Hadoop configurations using the `spark.conf.set` API, which applies to SQL and DataFrame operations. Hadoop configurations set on the `sparkContext` must be set in the cluster configuration or using a notebook. This is because configurations set on `sparkContext` are not tied to user sessions but apply to the entire cluster.\n\n", "chunk_id": "70bc43e2c2bc00ccffd209f5134ae1b2", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Troubleshooting\n\nRun `databricks-connect test` to check for connectivity issues. This section describes some common issues you may encounter with Databricks Connect and how to resolve them. \nIn this section: \n* [Python version mismatch](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#python-version-mismatch)\n* [Server not enabled](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#server-not-enabled)\n* [Conflicting PySpark installations](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#conflicting-pyspark-installations)\n* [Conflicting `SPARK_HOME`](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#conflicting-spark_home)\n* [Conflicting or Missing `PATH` entry for binaries](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#conflicting-or-missing-path-entry-for-binaries)\n* [Conflicting serialization settings on the cluster](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#conflicting-serialization-settings-on-the-cluster)\n* [Cannot find `winutils.exe` on Windows](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#cannot-find-winutilsexe-on-windows)\n* [The filename, directory name, or volume label syntax is incorrect on Windows](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#the-filename-directory-name-or-volume-label-syntax-is-incorrect-on-windows) \n### [Python version mismatch](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id18) \nCheck the Python version you are using locally has at least the same minor release as the version on the cluster (for example, `3.9.16` versus `3.9.15` is OK, `3.9` versus `3.8` is not). \nIf you have multiple Python versions installed locally, ensure that Databricks Connect is using the right one by setting the `PYSPARK_PYTHON` environment variable (for example, `PYSPARK_PYTHON=python3`). \n### [Server not enabled](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id19) \nEnsure the cluster has the Spark server enabled with `spark.databricks.service.server.enabled true`. You should see the following lines in the driver log if it is: \n```\n../../.. ..:..:.. INFO SparkConfUtils$: Set spark config:\nspark.databricks.service.server.enabled -> true\n...\n../../.. ..:..:.. INFO SparkContext: Loading Spark Service RPC Server\n../../.. ..:..:.. INFO SparkServiceRPCServer:\nStarting Spark Service RPC Server\n../../.. ..:..:.. INFO Server: jetty-9...\n../../.. ..:..:.. INFO AbstractConnector: Started ServerConnector@6a6c7f42\n{HTTP/1.1,[http/1.1]}{0.0.0.0:15001}\n../../.. ..:..:.. INFO Server: Started @5879ms\n\n``` \n### [Conflicting PySpark installations](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id20) \nThe `databricks-connect` package conflicts with PySpark. Having both installed will cause errors when initializing the Spark context in Python. This can manifest in several ways, including \u201cstream corrupted\u201d or \u201cclass not found\u201d errors. If you have PySpark installed in your Python environment, ensure it is uninstalled before installing databricks-connect. After uninstalling PySpark, make sure to fully re-install the Databricks Connect package: \n```\npip3 uninstall pyspark\npip3 uninstall databricks-connect\npip3 install --upgrade \"databricks-connect==12.2.*\" # or X.Y.* to match your specific cluster version.\n\n``` \n### [Conflicting `SPARK_HOME`](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id21) \nIf you have previously used Spark on your machine, your IDE may be configured to use one of those other versions of Spark rather than the Databricks Connect Spark. This can manifest in several ways, including \u201cstream corrupted\u201d or \u201cclass not found\u201d errors. You can see which version of Spark is being used by checking the value of the `SPARK_HOME` environment variable: \n```\nimport os\nprint(os.environ['SPARK_HOME'])\n\n``` \n```\nprintln(sys.env.get(\"SPARK_HOME\"))\n\n``` \n```\nSystem.out.println(System.getenv(\"SPARK_HOME\"));\n\n``` \n#### Resolution \nIf `SPARK_HOME` is set to a version of Spark other than the one in the client, you should unset the `SPARK_HOME` variable and try again. \nCheck your IDE environment variable settings, your `.bashrc`, `.zshrc`, or `.bash_profile` file, and anywhere else environment variables might be set. You will most likely have to quit and restart your IDE to purge the old state, and you may even need to create a new project if the problem persists. \nYou should not need to set `SPARK_HOME` to a new value; unsetting it should be sufficient. \n### [Conflicting or Missing `PATH` entry for binaries](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id22) \nIt is possible your PATH is configured so that commands like `spark-shell` will be running some other previously installed binary instead of the one provided with Databricks Connect. This can cause `databricks-connect test` to fail. You should make sure either the Databricks Connect binaries take precedence, or remove the previously installed ones. \nIf you can\u2019t run commands like `spark-shell`, it is also possible your PATH was not automatically set up by `pip3 install` and you\u2019ll need to add the installation `bin` dir to your PATH manually. It\u2019s possible to use Databricks Connect with IDEs even if this isn\u2019t set up. However, the `databricks-connect test` command will not work. \n### [Conflicting serialization settings on the cluster](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id23) \nIf you see \u201cstream corrupted\u201d errors when running `databricks-connect test`, this may be due to incompatible cluster serialization configs. For example, setting the `spark.io.compression.codec` config can cause this issue. To resolve this issue, consider removing these configs from the cluster settings, or setting the configuration in the Databricks Connect client. \n### [Cannot find `winutils.exe` on Windows](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id24) \nIf you are using Databricks Connect on Windows and see: \n```\nERROR Shell: Failed to locate the winutils binary in the hadoop binary path\njava.io.IOException: Could not locate executable null\\bin\\winutils.exe in the Hadoop binaries.\n\n``` \nFollow the instructions to [configure the Hadoop path on Windows](https://cwiki.apache.org/confluence/display/HADOOP2/Hadoop2OnWindows). \n### [The filename, directory name, or volume label syntax is incorrect on Windows](https://docs.databricks.com/dev-tools/databricks-connect-legacy.html#id25) \nIf you are using Windows and Databricks Connect and see: \n```\nThe filename, directory name, or volume label syntax is incorrect.\n\n``` \nEither Java or Databricks Connect was installed into a directory with a [space in your path](https://stackoverflow.com/questions/47028892/why-does-spark-shell-fail-with-the-filename-directory-name-or-volume-label-sy). You can work around this by either installing into a directory path without spaces, or configuring your path using the [short name form](https://stackoverflow.com/questions/892555/how-do-i-specify-c-program-files-without-a-space-in-it-for-programs-that-cant).\n\n", "chunk_id": "5f43c8f006d6bc56de4ad324727fd6a0", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n### What is Databricks Connect?\n##### Databricks Connect for Databricks Runtime 12.2 LTS and below\n###### Limitations\n\n* [Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/index.html). \n* Structured Streaming.\n* Running arbitrary code that is not a part of a Spark job on the remote cluster.\n* Native Scala, Python, and R APIs for Delta table operations (for example, `DeltaTable.forPath`) are not supported. However, the SQL API (`spark.sql(...)`) with Delta Lake operations and the Spark API (for example, `spark.read.load`) on Delta tables are both supported.\n* Copy into.\n* Using SQL functions, Python or Scala UDFs which are part of the server\u2019s catalog. However, locally introduced Scala and Python UDFs work.\n* [Apache Zeppelin](https://zeppelin.apache.org/) 0.7.x and below.\n* Connecting to clusters with [table access control](https://docs.databricks.com/data-governance/table-acls/table-acl.html).\n* Connecting to clusters with process isolation enabled (in other words, where `spark.databricks.pyspark.enableProcessIsolation` is set to `true`).\n* Delta `CLONE` SQL command.\n* Global temporary views.\n* [Koalas](https://docs.databricks.com/archive/legacy/koalas.html) and `pyspark.pandas`.\n* `CREATE TABLE table AS SELECT ...` SQL commands do not always work. Instead, use `spark.sql(\"SELECT ...\").write.saveAsTable(\"table\")`. \n* The following [Databricks Utilities (dbutils) reference](https://docs.databricks.com/dev-tools/databricks-utils.html): \n+ [credentials](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-credentials)\n+ [library](https://docs.databricks.com/archive/dev-tools/dbutils-library.html)\n+ [notebook workflow](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-workflow)\n+ [widgets](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets)\n* [AWS Glue catalog](https://docs.databricks.com/archive/external-metastores/aws-glue-metastore.html)\n\n", "chunk_id": "dd3793c7412bea7202b92d7d24f379e8", "url": "https://docs.databricks.com/dev-tools/databricks-connect-legacy.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 11.1 (unsupported)\n\nThe following release notes provide information about Databricks Runtime 11.1, powered by Apache Spark 3.3.0. Databricks released these images in July 2022.\n\n", "chunk_id": "f2beb1ee69d45d7ceaf25a5ab3f8965d", "url": "https://docs.databricks.com/archive/runtime-release-notes/11.1.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 11.1 (unsupported)\n##### New features and improvements\n\n* [Photon is GA](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#photon-is-ga)\n* [Photon: Supported instance types](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#photon-supported-instance-types)\n* [Change data feed can now automatically handle out-of-range timestamps](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#change-data-feed-can-now-automatically-handle-out-of-range-timestamps)\n* [Describe and show SQL functions now show Unity Catalog names in their output (Public Preview)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#describe-and-show-sql-functions-now-show-unity-catalog-names-in-their-output-public-preview)\n* [Schema inference and evolution for Parquet files in Auto Loader (Public Preview)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#schema-inference-and-evolution-for-parquet-files-in-auto-loader-public-preview)\n* [Auto Loader now supports schema evolution for Avro (GA)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#auto-loader-now-supports-schema-evolution-for-avro-ga)\n* [Delta Lake support for dynamic partition overwrites](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#delta-lake-support-for-dynamic-partition-overwrites)\n* [Information schema support for objects created in Unity Catalog](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#information-schema-support-for-objects-created-in-unity-catalog)\n* [Informational constraints on Delta Lake tables with Unity Catalog (Public Preview)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#informational-constraints-on-delta-lake-tables-with-unity-catalog-public-preview)\n* [Unity Catalog is GA](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#unity-catalog-is-ga)\n* [Delta Sharing is GA](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#delta-sharing-is-ga) \n### [Photon is GA](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id1) \n[Photon](https://docs.databricks.com/compute/photon.html) is now generally available, beginning with Databricks Runtime 11.1. Photon is the native vectorized query engine on Databricks, written to be directly compatible with Apache Spark APIs so it works with your existing code. Photon is developed in C++ to take advantage of modern hardware, and uses the latest techniques in vectorized query processing to capitalize on data- and instruction-level parallelism in CPUs, enhancing performance on real-world data and applications\u2014all natively on your data lake. \nPhoton is part of a high-performance runtime that runs your existing SQL and DataFrame API calls faster and reduces your total cost per workload. Photon is used by default in Databricks SQL warehouses. \nNew features and improvements include: \n* New vectorized sort operator\n* New vectorized window functions\n* New instance types and sizes across all clouds \nLimitations: \n* Scala/Python UDFs are not supported by Photon\n* RDD is not supported by Photon\n* Structured Streaming is not supported by Photon \nFor more information, see the following Photon announcements. \n#### Photon: New vectorized sort operator \nPhoton now supports a vectorized sort for when a query contains `SORT_BY`, `CLUSTER_BY`, or a window function with an `ORDER BY`. \nLimitations: Photon does not support a global `ORDER BY` clause. Sorts for window evaluation will photonize, but the global sort will continue to run in Spark. \n#### Photon: New vectorized window functions \nPhoton now supports vectorized window function evaluation for many frame types and functions. New window functions include: `row_number`, `rank`, `dense_rank`, `lag`, `lead`, `percent_rank`, `ntile`, and `nth_value`. Supported window frame types: running (`UNBOUNDED PRECEDING AND CURRENT ROW`), unbounded (`UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING`), growing (`UNBOUNDED PRECEDING AND FOLLOWING`), and shrinking (` PRECEDING AND UNBOUNDED FOLLOWING`). \nLimitations: \n* Photon supports only `ROWS` versions of all the frame types.\n* Photon does not yet support the sliding frame type (` PRECEDING AND FOLLOWING`). \n### [Photon: Supported instance types](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id2) \n| i3 | i3en | i4i | im4gn | is4gen |\n| --- | --- | --- | --- | --- |\n| m5ad | m5d | m5dn | m6gd | |\n| r5d | r5dn | r6gd | x2gd | | \n### [Change data feed can now automatically handle out-of-range timestamps](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id3) \nChange data feed (CDF) now has a new mode for you to provide timestamps or versions past a latest commit\u2019s version without throwing errors. This mode is disabled by default. You can enable it by setting the configuration `spark.databricks.delta.changeDataFeed.timestampOutOfRange.enabled` to `true`. \n### [Describe and show SQL functions now show Unity Catalog names in their output (Public Preview)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id4) \nThe commands `DESC TABLE`, `DESC DATABASE`, `DESC SCHEMA`, `DESC NAMESPACE`, `DESC FUNCTION`, `EXPLAIN`, and `SHOW CREATE TABLE` now always show the catalog name in their output. \n### [Schema inference and evolution for Parquet files in Auto Loader (Public Preview)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id5) \nAuto Loader now supports schema inference and evolution for Parquet files. Just like JSON, CSV, and Avro formats, you can now use the rescued data column to rescue unexpected data that may appear in your Parquet files. This includes data that cannot be parsed in the data type that\u2019s expected, columns that have a different casing, or additional columns that are not part of the expected schema. You can configure Auto Loader to evolve the schema automatically when encountering adding new columns in the incoming data. See [Configure schema inference and evolution in Auto Loader](https://docs.databricks.com/ingestion/auto-loader/schema.html). \n### [Auto Loader now supports schema evolution for Avro (GA)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id6) \nSee [Configure schema inference and evolution in Auto Loader](https://docs.databricks.com/ingestion/auto-loader/schema.html). \n### [Delta Lake support for dynamic partition overwrites](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id7) \nDelta Lake now enables dynamic partition overwrite mode to overwrite all existing data in each logical partition for which the write will commit new data. See [Selectively overwrite data with Delta Lake](https://docs.databricks.com/delta/selective-overwrite.html). \n### [Information schema support for objects created in Unity Catalog](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id8) \nInformation schema provides a SQL based, self describing API to the metadata of various database objects, including tables and views, constraints and routines.\nWithin the information schema you find a set of views describing the objects known to the schema\u2019s catalog that you are privileged the see.\nThe information schema of the `SYSTEM` catalog returns information about objects across all catalogs within the metastore.\nSee [Information schema](https://docs.databricks.com/sql/language-manual/sql-ref-information-schema.html). \n### [Informational constraints on Delta Lake tables with Unity Catalog (Public Preview)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id9) \nYou can now define informational primary key and foreign key constraints on Delta Lake tables with Unity Catalog.\nInformational constraints are not enforced.\nSee [CONSTRAINT clause](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-table-constraint.html). \n### [Unity Catalog is GA](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id10) \nUnity Catalog is now generally available beginning with Databricks Runtime 11.1. See [What is Unity Catalog?](https://docs.databricks.com/data-governance/unity-catalog/index.html). \n### [Delta Sharing is GA](https://docs.databricks.com/archive/runtime-release-notes/11.1.html#id11) \nDelta Sharing is now generally available beginning with Databricks Runtime 11.1. \nDatabricks to Databricks Delta Sharing is fully managed without the need for exchanging tokens. You can create and manage providers, recipients, and shares in the UI or with SQL and REST APIs. \nSome features include restricting recipient access, querying data with IP access lists and region restrictions, and delegating Delta Sharing management to non-admins. You can also query changes to data or share incremental versions with Change Data Feeds. See [Share data and AI assets securely using Delta Sharing](https://docs.databricks.com/data-sharing/index.html).\n\n", "chunk_id": "c1837e83918bcf96fe27b5a7af8490e1", "url": "https://docs.databricks.com/archive/runtime-release-notes/11.1.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 11.1 (unsupported)\n##### Behavior changes\n\n### Sensitive properties redaction for DESCRIBE TABLE and SHOW TABLE PROPERTIES \nThe `DESCRIBE TABLE` and `SHOW TABLE PROPERTIES` commands now redact sensitive properties. \n### Job clusters default to single user access mode with Databricks Runtime 11.1 and higher \nTo be Unity Catalog capable, job clusters using Databricks Runtime 11.1 and higher created through the jobs UI or jobs API will default to single user access mode. Single User access mode supports most programming languages, cluster features and data governance features. You can still configure shared access mode through the UI or API, but languages or features might be limited.\n\n", "chunk_id": "b9c10926cc6899b4ae100496fa057d9c", "url": "https://docs.databricks.com/archive/runtime-release-notes/11.1.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 11.1 (unsupported)\n##### Library upgrades\n\n* Upgraded Python libraries: \n+ filelock from 3.6.0 to 3.7.1\n+ plotly from 5.6.0 to 5.8.2\n+ protobuf from 3.20.1 to 4.21.2\n* Upgraded R libraries: \n+ chron from 2.3-56 to 2.3-57\n+ DBI from 1.1.2 to 1.1.3\n+ dbplyr from 2.1.1 to 2.2.0\n+ e1071 from 1.7-9 to 1.7-11\n+ future from 1.25.0 to 1.26.1\n+ globals from 0.14.0 to 0.15.1\n+ hardhat from 0.2.0 to 1.1.0\n+ ipred from 0.9-12 to 0.9-13\n+ openssl from 2.0.0 to 2.0.2\n+ parallelly from 1.31.1 to 1.32.0\n+ processx from 3.5.3 to 3.6.1\n+ progressr from 0.10.0 to 0.10.1\n+ proxy from 0.4-26 to 0.4-27\n+ ps from 1.7.0 to 1.7.1\n+ randomForest from 4.7-1 to 4.7-1.1\n+ roxygen2 from 7.1.2 to 7.2.0\n+ Rserve from 1.8-10 to 1.8-11\n+ RSQLite from 2.2.13 to 2.2.14\n+ sparklyr from 1.7.5 to 1.7.7\n+ tinytex from 0.38 to 0.40\n+ usethis from 2.1.5 to 2.1.6\n+ xfun from 0.30 to 0.31\n* Upgraded Java libraries: \n+ io.delta.delta-sharing-spark\\_2.12 from 0.4.0 to 0.5.0\n\n", "chunk_id": "927749f98938caaae32fb73782cf2d3d", "url": "https://docs.databricks.com/archive/runtime-release-notes/11.1.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 11.1 (unsupported)\n##### Apache Spark\n\nDatabricks Runtime 11.2 includes Apache Spark 3.3.0. This release includes all Spark fixes and improvements\nincluded in [Databricks Runtime 11.1 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/11.1.html), as well as the following additional bug fixes and improvements made to Spark: \n* [[SPARK-40054]](https://issues.apache.org/jira/browse/SPARK-40054) [SQL] Restore the error handling syntax of try\\_cast()\n* [[SPARK-39489]](https://issues.apache.org/jira/browse/SPARK-39489) [CORE] Improve event logging JsonProtocol performance by using Jackson instead of Json4s\n* [[SPARK-39319]](https://issues.apache.org/jira/browse/SPARK-39319) [CORE][SQL] Make query contexts as a part of `SparkThrowable`\n* [[SPARK-40085]](https://issues.apache.org/jira/browse/SPARK-40085) [SQL] Use INTERNAL\\_ERROR error class instead of IllegalStateException to indicate bugs\n* [[SPARK-40001]](https://issues.apache.org/jira/browse/SPARK-40001) [SQL] Make NULL writes to JSON DEFAULT columns write \u2018null\u2019 to storage\n* [[SPARK-39635]](https://issues.apache.org/jira/browse/SPARK-39635) [SQL] Support driver metrics in DS v2 custom metric API\n* [[SPARK-39184]](https://issues.apache.org/jira/browse/SPARK-39184) [SQL] Handle undersized result array in date and timestamp sequences\n* [[SPARK-40019]](https://issues.apache.org/jira/browse/SPARK-40019) [SQL] Refactor comment of ArrayType\u2019s containsNull and refactor the misunderstanding logics in collectionOperator\u2019s expression about `containsNull`\n* [[SPARK-39989]](https://issues.apache.org/jira/browse/SPARK-39989) [SQL] Support estimate column statistics if it is foldable expression\n* [[SPARK-39926]](https://issues.apache.org/jira/browse/SPARK-39926) [SQL] Fix bug in column DEFAULT support for non-vectorized Parquet scans\n* [[SPARK-40052]](https://issues.apache.org/jira/browse/SPARK-40052) [SQL] Handle direct byte buffers in VectorizedDeltaBinaryPackedReader\n* [[SPARK-40044]](https://issues.apache.org/jira/browse/SPARK-40044) [SQL] Fix the target interval type in cast overflow errors\n* [[SPARK-39835]](https://issues.apache.org/jira/browse/SPARK-39835) [SQL] Fix EliminateSorts remove global sort below the local sort\n* [[SPARK-40002]](https://issues.apache.org/jira/browse/SPARK-40002) [SQL] Don\u2019t push down limit through window using ntile\n* [[SPARK-39976]](https://issues.apache.org/jira/browse/SPARK-39976) [SQL] ArrayIntersect should handle null in left expression correctly\n* [[SPARK-39985]](https://issues.apache.org/jira/browse/SPARK-39985) [SQL] Enable implicit DEFAULT column values in inserts from DataFrames\n* [[SPARK-39776]](https://issues.apache.org/jira/browse/SPARK-39776) [SQL] JOIN verbose string should add Join type\n* [[SPARK-38901]](https://issues.apache.org/jira/browse/SPARK-38901) [SQL] DS V2 supports push down misc functions\n* [[SPARK-40028]](https://issues.apache.org/jira/browse/SPARK-40028) [SQL][FollowUp] Improve examples of string functions\n* [[SPARK-39983]](https://issues.apache.org/jira/browse/SPARK-39983) [CORE][SQL] Do not cache unserialized broadcast relations on the driver\n* [[SPARK-39812]](https://issues.apache.org/jira/browse/SPARK-39812) [SQL] Simplify code which construct `AggregateExpression` with `toAggregateExpression`\n* [[SPARK-40028]](https://issues.apache.org/jira/browse/SPARK-40028) [SQL] Add binary examples for string expressions\n* [[SPARK-39981]](https://issues.apache.org/jira/browse/SPARK-39981) [SQL] Throw the exception QueryExecutionErrors.castingCauseOverflowErrorInTableInsert in Cast\n* [[SPARK-40007]](https://issues.apache.org/jira/browse/SPARK-40007) [PYTHON][SQL] Add \u2018mode\u2019 to functions\n* [[SPARK-40008]](https://issues.apache.org/jira/browse/SPARK-40008) [SQL] Support casting of integrals to ANSI intervals\n* [[SPARK-40003]](https://issues.apache.org/jira/browse/SPARK-40003) [PYTHON][SQL] Add \u2018median\u2019 to functions\n* [[SPARK-39952]](https://issues.apache.org/jira/browse/SPARK-39952) [SQL] SaveIntoDataSourceCommand should recache result relation\n* [[SPARK-39951]](https://issues.apache.org/jira/browse/SPARK-39951) [SQL] Update Parquet V2 columnar check for nested fields\n* [[SPARK-39775]](https://issues.apache.org/jira/browse/SPARK-39775) [CORE][AVRO] Disable validate default values when parsing Avro schemas\n* [[SPARK-33236]](https://issues.apache.org/jira/browse/SPARK-33236) [shuffle] Backport to DBR 11.x: Enable Push-based shuffle service to store state in NM level DB for work preserving restart\n* [[SPARK-39836]](https://issues.apache.org/jira/browse/SPARK-39836) [SQL] Simplify V2ExpressionBuilder by extract common method.\n* [[SPARK-39867]](https://issues.apache.org/jira/browse/SPARK-39867) [SQL] Global limit should not inherit OrderPreservingUnaryNode\n* [[SPARK-39873]](https://issues.apache.org/jira/browse/SPARK-39873) [SQL] Remove `OptimizeLimitZero` and merge it into `EliminateLimits`\n* [[SPARK-39961]](https://issues.apache.org/jira/browse/SPARK-39961) [SQL] DS V2 push-down translate Cast if the cast is safe\n* [[SPARK-39872]](https://issues.apache.org/jira/browse/SPARK-39872) [SQL] Change to use `BytePackerForLong#unpack8Values` with Array input api in `VectorizedDeltaBinaryPackedReader`\n* [[SPARK-39858]](https://issues.apache.org/jira/browse/SPARK-39858) [SQL] Remove unnecessary `AliasHelper` or `PredicateHelper` for some rules\n* [[SPARK-39962]](https://issues.apache.org/jira/browse/SPARK-39962) [WARMFIX][ES-393486][PYTHON][SQL] Apply projection when group attributes are empty\n* [[SPARK-39900]](https://issues.apache.org/jira/browse/SPARK-39900) [SQL] Address partial or negated condition in binary format\u2019s predicate pushdown\n* [[SPARK-39904]](https://issues.apache.org/jira/browse/SPARK-39904) [SQL] Rename inferDate to prefersDate and clarify semantics of the option in CSV data source\n* [[SPARK-39958]](https://issues.apache.org/jira/browse/SPARK-39958) [SQL] Add warning log when unable to load custom metric object\n* [[SPARK-39936]](https://issues.apache.org/jira/browse/SPARK-39936) [SQL] Store schema in properties for Spark Views\n* [[SPARK-39932]](https://issues.apache.org/jira/browse/SPARK-39932) [SQL] WindowExec should clear the final partition buffer\n* [[SPARK-37194]](https://issues.apache.org/jira/browse/SPARK-37194) [SQL] Avoid unnecessary sort in v1 write if it\u2019s not dynamic partition\n* [[SPARK-39902]](https://issues.apache.org/jira/browse/SPARK-39902) [SQL] Add Scan details to spark plan scan node in SparkUI\n* [[SPARK-39865]](https://issues.apache.org/jira/browse/SPARK-39865) [SQL] Show proper error messages on the overflow errors of table insert\n* [[SPARK-39940]](https://issues.apache.org/jira/browse/SPARK-39940) [SS] Refresh catalog table on streaming query with DSv1 sink\n* [[SPARK-39827]](https://issues.apache.org/jira/browse/SPARK-39827) [SQL] Use the error class `ARITHMETIC_OVERFLOW` on int overflow in `add_months()`\n* [[SPARK-39914]](https://issues.apache.org/jira/browse/SPARK-39914) [SQL] Add DS V2 Filter to V1 Filter conversion\n* [[SPARK-39857]](https://issues.apache.org/jira/browse/SPARK-39857) [SQL] Manual DBR 11.x backport; V2ExpressionBuilder uses the wrong LiteralValue data type for In predicate #43454\n* [[SPARK-39840]](https://issues.apache.org/jira/browse/SPARK-39840) [SQL][PYTHON] Factor PythonArrowInput out as a symmetry to PythonArrowOutput\n* [[SPARK-39651]](https://issues.apache.org/jira/browse/SPARK-39651) [SQL] Prune filter condition if compare with rand is deterministic\n* [[SPARK-39877]](https://issues.apache.org/jira/browse/SPARK-39877) [PYTHON] Add unpivot to PySpark DataFrame API\n* [[SPARK-39847]](https://issues.apache.org/jira/browse/SPARK-39847) [WARMFIX][SS] Fix race condition in RocksDBLoader.loadLibrary() if caller thread is interrupted\n* [[SPARK-39909]](https://issues.apache.org/jira/browse/SPARK-39909) [SQL] Organize the check of push down information for JDBCV2Suite\n* [[SPARK-39834]](https://issues.apache.org/jira/browse/SPARK-39834) [SQL][SS] Include the origin stats and constraints for LogicalRDD if it comes from DataFrame\n* [[SPARK-39849]](https://issues.apache.org/jira/browse/SPARK-39849) [SQL] Dataset.as(StructType) fills missing new columns with null value\n* [[SPARK-39860]](https://issues.apache.org/jira/browse/SPARK-39860) [SQL] More expressions should extend Predicate\n* [[SPARK-39823]](https://issues.apache.org/jira/browse/SPARK-39823) [SQL][PYTHON] Rename Dataset.as as Dataset.to and add DataFrame.to in PySpark\n* [[SPARK-39918]](https://issues.apache.org/jira/browse/SPARK-39918) [SQL][MINOR] Replace the wording \u201cun-comparable\u201d with \u201cincomparable\u201d in error message\n* [[SPARK-39857]](https://issues.apache.org/jira/browse/SPARK-39857) [SQL][3.3] V2ExpressionBuilder uses the wrong LiteralValue data type for In predicate\n* [[SPARK-39862]](https://issues.apache.org/jira/browse/SPARK-39862) [SQL] Manual backport for PR 43654 targeting DBR 11.x: Update SQLConf.DEFAULT\\_COLUMN\\_ALLOWED\\_PROVIDERS to allow/deny ALTER TABLE \u2026 ADD COLUMN commands separately.\n* [[SPARK-39844]](https://issues.apache.org/jira/browse/SPARK-39844) [SQL] Manual backport for PR 43652 targeting DBR 11.x\n* [[SPARK-39899]](https://issues.apache.org/jira/browse/SPARK-39899) [SQL] Fix passing of message parameters to `InvalidUDFClassException`\n* [[SPARK-39890]](https://issues.apache.org/jira/browse/SPARK-39890) [SQL] Make TakeOrderedAndProjectExec inherit AliasAwareOutputOrdering\n* [[SPARK-39809]](https://issues.apache.org/jira/browse/SPARK-39809) [PYTHON] Support CharType in PySpark\n* [[SPARK-38864]](https://issues.apache.org/jira/browse/SPARK-38864) [SQL] Add unpivot / melt to Dataset\n* [[SPARK-39864]](https://issues.apache.org/jira/browse/SPARK-39864) [SQL] Lazily register ExecutionListenerBus\n* [[SPARK-39808]](https://issues.apache.org/jira/browse/SPARK-39808) [SQL] Support aggregate function MODE\n* [[SPARK-39839]](https://issues.apache.org/jira/browse/SPARK-39839) [SQL] Handle special case of null variable-length Decimal with non-zero offsetAndSize in UnsafeRow structural integrity check\n* [[SPARK-39875]](https://issues.apache.org/jira/browse/SPARK-39875) [SQL] Change `protected` method in final class to `private` or `package-visible`\n* [[SPARK-39731]](https://issues.apache.org/jira/browse/SPARK-39731) [SQL] Fix issue in CSV and JSON data sources when parsing dates in \u201cyyyyMMdd\u201d format with CORRECTED time parser policy\n* [[SPARK-39805]](https://issues.apache.org/jira/browse/SPARK-39805) [SS] Deprecate Trigger.Once and Promote Trigger.AvailableNow\n* [[SPARK-39784]](https://issues.apache.org/jira/browse/SPARK-39784) [SQL] Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter\n* [[SPARK-39672]](https://issues.apache.org/jira/browse/SPARK-39672) [SQL][3.1] Fix removing project before filter with correlated subquery\n* [[SPARK-39552]](https://issues.apache.org/jira/browse/SPARK-39552) [SQL] Unify v1 and v2 `DESCRIBE TABLE`\n* [[SPARK-39806]](https://issues.apache.org/jira/browse/SPARK-39806) [SQL] Accessing `_metadata` on partitioned table can crash a query\n* [[SPARK-39810]](https://issues.apache.org/jira/browse/SPARK-39810) [SQL] Catalog.tableExists should handle nested namespace\n* [[SPARK-37287]](https://issues.apache.org/jira/browse/SPARK-37287) [SQL] Pull out dynamic partition and bucket sort from FileFormatWriter\n* [[SPARK-39469]](https://issues.apache.org/jira/browse/SPARK-39469) [SQL] Infer date type for CSV schema inference\n* [[SPARK-39148]](https://issues.apache.org/jira/browse/SPARK-39148) [SQL] DS V2 aggregate push down can work with OFFSET or LIMIT\n* [[SPARK-39818]](https://issues.apache.org/jira/browse/SPARK-39818) [SQL] Fix bug in ARRAY, STRUCT, MAP types with DEFAULT values with NULL field(s)\n* [[SPARK-39792]](https://issues.apache.org/jira/browse/SPARK-39792) [SQL] Add DecimalDivideWithOverflowCheck for decimal average\n* [[SPARK-39798]](https://issues.apache.org/jira/browse/SPARK-39798) [SQL] Replcace `toSeq.toArray` with `.toArray[Any]` in constructor of `GenericArrayData`\n* [[SPARK-39759]](https://issues.apache.org/jira/browse/SPARK-39759) [SQL] Implement listIndexes in JDBC (H2 dialect)\n* [[SPARK-39385]](https://issues.apache.org/jira/browse/SPARK-39385) [SQL] Supports push down `REGR_AVGX` and `REGR_AVGY`\n* [[SPARK-39787]](https://issues.apache.org/jira/browse/SPARK-39787) [SQL] Use error class in the parsing error of function to\\_timestamp\n* [[SPARK-39760]](https://issues.apache.org/jira/browse/SPARK-39760) [PYTHON] Support Varchar in PySpark\n* [[SPARK-39557]](https://issues.apache.org/jira/browse/SPARK-39557) [SQL] Manual backport to DBR 11.x: Support ARRAY, STRUCT, MAP types as DEFAULT values\n* [[SPARK-39758]](https://issues.apache.org/jira/browse/SPARK-39758) [SQL][3.3] Fix NPE from the regexp functions on invalid patterns\n* [[SPARK-39749]](https://issues.apache.org/jira/browse/SPARK-39749) [SQL] ANSI SQL mode: Use plain string representation on casting Decimal to String\n* [[SPARK-39704]](https://issues.apache.org/jira/browse/SPARK-39704) [SQL] Implement createIndex & dropIndex & indexExists in JDBC (H2 dialect)\n* [[SPARK-39803]](https://issues.apache.org/jira/browse/SPARK-39803) [SQL] Use `LevenshteinDistance` instead of `StringUtils.getLevenshteinDistance`\n* [[SPARK-39339]](https://issues.apache.org/jira/browse/SPARK-39339) [SQL] Support TimestampNTZ type in JDBC data source\n* [[SPARK-39781]](https://issues.apache.org/jira/browse/SPARK-39781) [SS] Add support for providing max\\_open\\_files to rocksdb state store provider\n* [[SPARK-39719]](https://issues.apache.org/jira/browse/SPARK-39719) [R] Implement databaseExists/getDatabase in SparkR support 3L namespace\n* [[SPARK-39751]](https://issues.apache.org/jira/browse/SPARK-39751) [SQL] Rename hash aggregate key probes metric\n* [[SPARK-39772]](https://issues.apache.org/jira/browse/SPARK-39772) [SQL] namespace should be null when database is null in the old constructors\n* [[SPARK-39625]](https://issues.apache.org/jira/browse/SPARK-39625) [SPARK-38904][SQL] Add Dataset.as(StructType)\n* [[SPARK-39384]](https://issues.apache.org/jira/browse/SPARK-39384) [SQL] Compile built-in linear regression aggregate functions for JDBC dialect\n* [[SPARK-39720]](https://issues.apache.org/jira/browse/SPARK-39720) [R] Implement tableExists/getTable in SparkR for 3L namespace\n* [[SPARK-39744]](https://issues.apache.org/jira/browse/SPARK-39744) [SQL] Add the `REGEXP_INSTR` function\n* [[SPARK-39716]](https://issues.apache.org/jira/browse/SPARK-39716) [R] Make currentDatabase/setCurrentDatabase/listCatalogs in SparkR support 3L namespace\n* [[SPARK-39788]](https://issues.apache.org/jira/browse/SPARK-39788) [SQL] Rename `catalogName` to `dialectName` for `JdbcUtils`\n* [[SPARK-39647]](https://issues.apache.org/jira/browse/SPARK-39647) [CORE] Register the executor with ESS before registering the BlockManager\n* [[SPARK-39754]](https://issues.apache.org/jira/browse/SPARK-39754) [CORE][SQL] Remove unused `import` or unnecessary `{}`\n* [[SPARK-39706]](https://issues.apache.org/jira/browse/SPARK-39706) [SQL] Set missing column with defaultValue as constant in `ParquetColumnVector`\n* [[SPARK-39699]](https://issues.apache.org/jira/browse/SPARK-39699) [SQL] Make CollapseProject smarter about collection creation expressions\n* [[SPARK-39737]](https://issues.apache.org/jira/browse/SPARK-39737) [SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` should support aggregate filter\n* [[SPARK-39579]](https://issues.apache.org/jira/browse/SPARK-39579) [SQL][PYTHON][R] Make ListFunctions/getFunction/functionExists compatible with 3 layer namespace\n* [[SPARK-39627]](https://issues.apache.org/jira/browse/SPARK-39627) [SQL] JDBC V2 pushdown should unify the compile API\n* [[SPARK-39748]](https://issues.apache.org/jira/browse/SPARK-39748) [SQL][SS] Include the origin logical plan for LogicalRDD if it comes from DataFrame\n* [[SPARK-39385]](https://issues.apache.org/jira/browse/SPARK-39385) [SQL] Translate linear regression aggregate functions for pushdown\n* [[SPARK-39695]](https://issues.apache.org/jira/browse/SPARK-39695) [SQL] Add the `REGEXP_SUBSTR` function\n* [[SPARK-39667]](https://issues.apache.org/jira/browse/SPARK-39667) [SQL] Add another workaround when there is not enough memory to build and broadcast the table\n* [[SPARK-39666]](https://issues.apache.org/jira/browse/SPARK-39666) [ES-337834][SQL] Use UnsafeProjection.create to respect `spark.sql.codegen.factoryMode` in ExpressionEncoder\n* [[SPARK-39643]](https://issues.apache.org/jira/browse/SPARK-39643) [SQL] Prohibit subquery expressions in DEFAULT values\n* [[SPARK-38647]](https://issues.apache.org/jira/browse/SPARK-38647) [SQL] Add SupportsReportOrdering mix in interface for Scan (DataSourceV2)\n* [[SPARK-39497]](https://issues.apache.org/jira/browse/SPARK-39497) [SQL] Improve the analysis exception of missing map key column\n* [[SPARK-39661]](https://issues.apache.org/jira/browse/SPARK-39661) [SQL] Avoid creating unnecessary SLF4J Logger\n* [[SPARK-39713]](https://issues.apache.org/jira/browse/SPARK-39713) [SQL] ANSI mode: add suggestion of using try\\_element\\_at for INVALID\\_ARRAY\\_INDEX error\n* [[SPARK-38899]](https://issues.apache.org/jira/browse/SPARK-38899) [SQL]DS V2 supports push down datetime functions\n* [[SPARK-39638]](https://issues.apache.org/jira/browse/SPARK-39638) [SQL] Change to use `ConstantColumnVector` to store partition columns in `OrcColumnarBatchReader`\n* [[SPARK-39653]](https://issues.apache.org/jira/browse/SPARK-39653) [SQL] Clean up `ColumnVectorUtils#populate(WritableColumnVector, InternalRow, int)` from `ColumnVectorUtils`\n* [[SPARK-39231]](https://issues.apache.org/jira/browse/SPARK-39231) [SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordReader`\n* [[SPARK-39547]](https://issues.apache.org/jira/browse/SPARK-39547) [SQL] V2SessionCatalog should not throw NoSuchDatabaseException in loadNamspaceMetadata\n* [[SPARK-39447]](https://issues.apache.org/jira/browse/SPARK-39447) [SQL] Avoid AssertionError in AdaptiveSparkPlanExec.doExecuteBroadcast\n* [[SPARK-39492]](https://issues.apache.org/jira/browse/SPARK-39492) [SQL] Rework MISSING\\_COLUMN\n* [[SPARK-39679]](https://issues.apache.org/jira/browse/SPARK-39679) [SQL] TakeOrderedAndProjectExec should respect child output ordering\n* [[SPARK-39606]](https://issues.apache.org/jira/browse/SPARK-39606) [SQL] Use child stats to estimate order operator\n* [[SPARK-39611]](https://issues.apache.org/jira/browse/SPARK-39611) [PYTHON][PS] Fix wrong aliases in **array\\_ufunc**\n* [[SPARK-39656]](https://issues.apache.org/jira/browse/SPARK-39656) [SQL][3.3] Fix wrong namespace in DescribeNamespaceExec\n* [[SPARK-39675]](https://issues.apache.org/jira/browse/SPARK-39675) [SQL] Switch \u2018spark.sql.codegen.factoryMode\u2019 configuration from testing purpose to internal purpose\n* [[SPARK-39139]](https://issues.apache.org/jira/browse/SPARK-39139) [SQL] DS V2 supports push down DS V2 UDF\n* [[SPARK-39434]](https://issues.apache.org/jira/browse/SPARK-39434) [SQL] Provide runtime error query context when array index is out of bounding\n* [[SPARK-39479]](https://issues.apache.org/jira/browse/SPARK-39479) [SQL] DS V2 supports push down math functions(non ANSI)\n* [[SPARK-39618]](https://issues.apache.org/jira/browse/SPARK-39618) [SQL] Add the `REGEXP_COUNT` function\n* [[SPARK-39553]](https://issues.apache.org/jira/browse/SPARK-39553) [CORE] Multi-thread unregister shuffle shouldn\u2019t throw NPE when using Scala 2.13\n* [[SPARK-38755]](https://issues.apache.org/jira/browse/SPARK-38755) [PYTHON][3.3] Add file to address missing pandas general functions\n* [[SPARK-39444]](https://issues.apache.org/jira/browse/SPARK-39444) [SQL] Add OptimizeSubqueries into nonExcludableRules list\n* [[SPARK-39316]](https://issues.apache.org/jira/browse/SPARK-39316) [SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic\n* [[SPARK-39505]](https://issues.apache.org/jira/browse/SPARK-39505) [UI] Escape log content rendered in UI\n* [[SPARK-39448]](https://issues.apache.org/jira/browse/SPARK-39448) [SQL] Add `ReplaceCTERefWithRepartition` into `nonExcludableRules` list\n* [[SPARK-37961]](https://issues.apache.org/jira/browse/SPARK-37961) [SQL] Override maxRows/maxRowsPerPartition for some logical operators\n* [[SPARK-35223]](https://issues.apache.org/jira/browse/SPARK-35223) Revert Add IssueNavigationLink\n* [[SPARK-39633]](https://issues.apache.org/jira/browse/SPARK-39633) [SQL] Support timestamp in seconds for TimeTravel using Dataframe options\n* [[SPARK-38796]](https://issues.apache.org/jira/browse/SPARK-38796) [SQL] Update documentation for number format strings with the {try\\_}to\\_number functions\n* [[SPARK-39650]](https://issues.apache.org/jira/browse/SPARK-39650) [SS] Fix incorrect value schema in streaming deduplication with backward compatibility\n* [[SPARK-39636]](https://issues.apache.org/jira/browse/SPARK-39636) [CORE][UI] Fix multiple bugs in JsonProtocol, impacting off heap StorageLevels and Task/Executor ResourceRequests\n* [[SPARK-39432]](https://issues.apache.org/jira/browse/SPARK-39432) [SQL] Return ELEMENT\\_AT\\_BY\\_INDEX\\_ZERO from element\\_at(\\*, 0)\n* [[SPARK-39349]](https://issues.apache.org/jira/browse/SPARK-39349) Add a centralized CheckError method for QA of error path\n* [[SPARK-39453]](https://issues.apache.org/jira/browse/SPARK-39453) [SQL] DS V2 supports push down misc non-aggregate functions(non ANSI)\n* [[SPARK-38978]](https://issues.apache.org/jira/browse/SPARK-38978) [SQL] DS V2 supports push down OFFSET operator\n* [[SPARK-39567]](https://issues.apache.org/jira/browse/SPARK-39567) [SQL] Support ANSI intervals in the percentile functions\n* [[SPARK-39383]](https://issues.apache.org/jira/browse/SPARK-39383) [SQL] Support DEFAULT columns in ALTER TABLE ALTER COLUMNS to V2 data sources\n* [[SPARK-39396]](https://issues.apache.org/jira/browse/SPARK-39396) [SQL] Fix LDAP login exception \u2018error code 49 - invalid credentials\u2019\n* [[SPARK-39548]](https://issues.apache.org/jira/browse/SPARK-39548) [SQL] CreateView Command with a window clause query hit a wrong window definition not found issue\n* [[SPARK-39575]](https://issues.apache.org/jira/browse/SPARK-39575) [AVRO] add ByteBuffer#rewind after ByteBuffer#get in Avr\u2026\n* [[SPARK-39543]](https://issues.apache.org/jira/browse/SPARK-39543) The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1\n* [[SPARK-39564]](https://issues.apache.org/jira/browse/SPARK-39564) [SS] Expose the information of catalog table to the logical plan in streaming query\n* [[SPARK-39582]](https://issues.apache.org/jira/browse/SPARK-39582) [SQL] Fix \u201cSince\u201d marker for `array_agg`\n* [[SPARK-39388]](https://issues.apache.org/jira/browse/SPARK-39388) [SQL] Reuse `orcSchema` when push down Orc predicates\n* [[SPARK-39511]](https://issues.apache.org/jira/browse/SPARK-39511) [SQL] Enhance push down local limit 1 for right side of left semi/anti join if join condition is empty\n* [[SPARK-38614]](https://issues.apache.org/jira/browse/SPARK-38614) [SQL] Don\u2019t push down limit through window that\u2019s using percent\\_rank\n* [[SPARK-39551]](https://issues.apache.org/jira/browse/SPARK-39551) [SQL] Add AQE invalid plan check\n* [[SPARK-39383]](https://issues.apache.org/jira/browse/SPARK-39383) [SQL] Support DEFAULT columns in ALTER TABLE ADD COLUMNS to V2 data sources\n* [[SPARK-39538]](https://issues.apache.org/jira/browse/SPARK-39538) [SQL] Avoid creating unnecessary SLF4J Logger\n* [[SPARK-39383]](https://issues.apache.org/jira/browse/SPARK-39383) [SQL] Manual backport to DBR 11.x: Refactor DEFAULT column support to skip passing the primary Analyzer around\n* [[SPARK-39397]](https://issues.apache.org/jira/browse/SPARK-39397) [SQL] Relax AliasAwareOutputExpression to support alias with expression\n* [[SPARK-39496]](https://issues.apache.org/jira/browse/SPARK-39496) [SQL] Handle null struct in `Inline.eval`\n* [[SPARK-39545]](https://issues.apache.org/jira/browse/SPARK-39545) [SQL] Override `concat` method for `ExpressionSet` in Scala 2.13 to improve the performance\n* [[SPARK-39340]](https://issues.apache.org/jira/browse/SPARK-39340) [SQL] DS v2 agg pushdown should allow dots in the name of top-level columns\n* [[SPARK-39488]](https://issues.apache.org/jira/browse/SPARK-39488) [SQL] Simplify the error handling of TempResolvedColumn\n* [[SPARK-38846]](https://issues.apache.org/jira/browse/SPARK-38846) [SQL] Add explicit data mapping between Teradata Numeric Type and Spark DecimalType\n* [[SPARK-39520]](https://issues.apache.org/jira/browse/SPARK-39520) [SQL] Override `--` method for `ExpressionSet` in Scala 2.13\n* [[SPARK-39470]](https://issues.apache.org/jira/browse/SPARK-39470) [SQL] Support cast of ANSI intervals to decimals\n* [[SPARK-39477]](https://issues.apache.org/jira/browse/SPARK-39477) [SQL] Remove \u201cNumber of queries\u201d info from the golden files of SQLQueryTestSuite\n* [[SPARK-39419]](https://issues.apache.org/jira/browse/SPARK-39419) [SQL] Fix ArraySort to throw an exception when the comparator returns null\n* [[SPARK-39061]](https://issues.apache.org/jira/browse/SPARK-39061) [SQL] Set nullable correctly for `Inline` output attributes\n* [[SPARK-39320]](https://issues.apache.org/jira/browse/SPARK-39320) [SQL] Support aggregate function `MEDIAN`\n* [[SPARK-39261]](https://issues.apache.org/jira/browse/SPARK-39261) [CORE] Improve newline formatting for error messages\n* [[SPARK-39355]](https://issues.apache.org/jira/browse/SPARK-39355) [SQL] Single column uses quoted to construct UnresolvedAttribute\n* [[SPARK-39351]](https://issues.apache.org/jira/browse/SPARK-39351) [SQL] SHOW CREATE TABLE should redact properties\n* [[SPARK-37623]](https://issues.apache.org/jira/browse/SPARK-37623) [SQL] Support ANSI Aggregate Function: regr\\_intercept\n* [[SPARK-39374]](https://issues.apache.org/jira/browse/SPARK-39374) [SQL] Improve error message for user specified column list\n* [[SPARK-39255]](https://issues.apache.org/jira/browse/SPARK-39255) [SQL][3.3] Improve error messages\n* [[SPARK-39321]](https://issues.apache.org/jira/browse/SPARK-39321) [SQL] Refactor TryCast to use RuntimeReplaceable\n* [[SPARK-39406]](https://issues.apache.org/jira/browse/SPARK-39406) [PYTHON] Accept NumPy array in createDataFrame\n* [[SPARK-39267]](https://issues.apache.org/jira/browse/SPARK-39267) [SQL] Clean up dsl unnecessary symbol\n* [[SPARK-39171]](https://issues.apache.org/jira/browse/SPARK-39171) [SQL] Unify the Cast expression\n* [[SPARK-28330]](https://issues.apache.org/jira/browse/SPARK-28330) [SQL] Support ANSI SQL: result offset clause in query expression\n* [[SPARK-39203]](https://issues.apache.org/jira/browse/SPARK-39203) [SQL] Rewrite table location to absolute URI based on database URI\n* [[SPARK-39313]](https://issues.apache.org/jira/browse/SPARK-39313) [SQL] `toCatalystOrdering` should fail if V2Expression can not be translated\n* [[SPARK-39301]](https://issues.apache.org/jira/browse/SPARK-39301) [SQL][PYTHON] Leverage LocalRelation and respect Arrow batch size in createDataFrame with Arrow optimization\n* [[SPARK-39400]](https://issues.apache.org/jira/browse/SPARK-39400) [SQL] spark-sql should remove hive resource dir in all case\n\n", "chunk_id": "989ece59b4dbab45b54544abde795f02", "url": "https://docs.databricks.com/archive/runtime-release-notes/11.1.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 11.1 (unsupported)\n##### Maintenance updates\n\nSee [Databricks Runtime 11.1 maintenance updates](https://docs.databricks.com/archive/runtime-release-notes/maintenance-updates-archive.html#111).\n\n", "chunk_id": "6c51d25a2557e84a9a95e02962c48743", "url": "https://docs.databricks.com/archive/runtime-release-notes/11.1.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 11.1 (unsupported)\n##### System environment\n\n* **Operating System**: Ubuntu 20.04.4 LTS\n* **Java**: Zulu 8.56.0.21-CA-linux64\n* **Scala**: 2.12.14\n* **Python**: 3.9.5\n* **R**: 4.1.3\n* **Delta Lake**: 1.2.1 \n### Installed Python libraries \n| Library | Version | Library | Version | Library | Version |\n| --- | --- | --- | --- | --- | --- |\n| Antergos Linux | 2015.10 (ISO-Rolling) | argon2-cffi | 20.1.0 | async-generator | 1.10 |\n| attrs | 21.2.0 | backcall | 0.2.0 | backports.entry-points-selectable | 1.1.1 |\n| black | 22.3.0 | bleach | 4.0.0 | boto3 | 1.21.18 |\n| botocore | 1.24.18 | certifi | 2021.10.8 | cffi | 1.14.6 |\n| chardet | 4.0.0 | charset-normalizer | 2.0.4 | click | 8.0.3 |\n| cryptography | 3.4.8 | cycler | 0.10.0 | Cython | 0.29.24 |\n| dbus-python | 1.2.16 | debugpy | 1.4.1 | decorator | 5.1.0 |\n| defusedxml | 0.7.1 | distlib | 0.3.5 | distro-info | 0.23ubuntu1 |\n| entrypoints | 0.3 | facets-overview | 1.0.0 | filelock | 3.8.0 |\n| idna | 3.2 | ipykernel | 6.12.1 | ipython | 7.32.0 |\n| ipython-genutils | 0.2.0 | ipywidgets | 7.7.0 | jedi | 0.18.0 |\n| Jinja2 | 2.11.3 | jmespath | 0.10.0 | joblib | 1.0.1 |\n| jsonschema | 3.2.0 | jupyter-client | 6.1.12 | jupyter-core | 4.8.1 |\n| jupyterlab-pygments | 0.1.2 | jupyterlab-widgets | 1.0.0 | kiwisolver | 1.3.1 |\n| MarkupSafe | 2.0.1 | matplotlib | 3.4.3 | matplotlib-inline | 0.1.2 |\n| mistune | 0.8.4 | mypy-extensions | 0.4.3 | nbclient | 0.5.3 |\n| nbconvert | 6.1.0 | nbformat | 5.1.3 | nest-asyncio | 1.5.1 |\n| notebook | 6.4.5 | numpy | 1.20.3 | packaging | 21.0 |\n| pandas | 1.3.4 | pandocfilters | 1.4.3 | parso | 0.8.2 |\n| pathspec | 0.9.0 | patsy | 0.5.2 | pexpect | 4.8.0 |\n| pickleshare | 0.7.5 | Pillow | 8.4.0 | pip | 21.2.4 |\n| platformdirs | 2.5.2 | plotly | 5.9.0 | prometheus-client | 0.11.0 |\n| prompt-toolkit | 3.0.20 | protobuf | 4.21.5 | psutil | 5.8.0 |\n| psycopg2 | 2.9.3 | ptyprocess | 0.7.0 | pyarrow | 7.0.0 |\n| pycparser | 2.20 | Pygments | 2.10.0 | PyGObject | 3.36.0 |\n| pyodbc | 4.0.31 | pyparsing | 3.0.4 | pyrsistent | 0.18.0 |\n| python-apt | 2.0.0+ubuntu0.20.4.7 | python-dateutil | 2.8.2 | pytz | 2021.3 |\n| pyzmq | 22.2.1 | requests | 2.26.0 | requests-unixsocket | 0.2.0 |\n| s3transfer | 0.5.2 | scikit-learn | 0.24.2 | scipy | 1.7.1 |\n| seaborn | 0.11.2 | Send2Trash | 1.8.0 | setuptools | 58.0.4 |\n| six | 1.16.0 | ssh-import-id | 5.10 | statsmodels | 0.12.2 |\n| tenacity | 8.0.1 | terminado | 0.9.4 | testpath | 0.5.0 |\n| threadpoolctl | 2.2.0 | tokenize-rt | 4.2.1 | tomli | 2.0.1 |\n| tornado | 6.1 | traitlets | 5.1.0 | typing-extensions | 3.10.0.2 |\n| unattended-upgrades | 0.1 | urllib3 | 1.26.7 | virtualenv | 20.8.0 |\n| wcwidth | 0.2.5 | webencodings | 0.5.1 | wheel | 0.37.0 |\n| widgetsnbextension | 3.6.0 | | | | | \n### Installed R libraries \nR libraries are installed from the Microsoft CRAN snapshot on 2022-08-15. \n| Library | Version | Library | Version | Library | Version |\n| --- | --- | --- | --- | --- | --- |\n| askpass | 1.1 | assertthat | 0.2.1 | backports | 1.4.1 |\n| base | 4.1.3 | base64enc | 0.1-3 | bit | 4.0.4 |\n| bit64 | 4.0.5 | blob | 1.2.3 | boot | 1.3-28 |\n| brew | 1.0-7 | brio | 1.1.3 | broom | 1.0.0 |\n| bslib | 0.4.0 | cachem | 1.0.6 | callr | 3.7.1 |\n| caret | 6.0-93 | cellranger | 1.1.0 | chron | 2.3-57 |\n| class | 7.3-20 | cli | 3.3.0 | clipr | 0.8.0 |\n| cluster | 2.1.3 | codetools | 0.2-18 | colorspace | 2.0-3 |\n| commonmark | 1.8.0 | compiler | 4.1.3 | config | 0.3.1 |\n| cpp11 | 0.4.2 | crayon | 1.5.1 | credentials | 1.3.2 |\n| curl | 4.3.2 | data.table | 1.14.2 | datasets | 4.1.3 |\n| DBI | 1.1.3 | dbplyr | 2.2.1 | desc | 1.4.1 |\n| devtools | 2.4.4 | diffobj | 0.3.5 | digest | 0.6.29 |\n| downlit | 0.4.2 | dplyr | 1.0.9 | dtplyr | 1.2.1 |\n| e1071 | 1.7-11 | ellipsis | 0.3.2 | evaluate | 0.16 |\n| fansi | 1.0.3 | farver | 2.1.1 | fastmap | 1.1.0 |\n| fontawesome | 0.3.0 | forcats | 0.5.1 | foreach | 1.5.2 |\n| foreign | 0.8-82 | forge | 0.2.0 | fs | 1.5.2 |\n| future | 1.27.0 | future.apply | 1.9.0 | gargle | 1.2.0 |\n| generics | 0.1.3 | gert | 1.7.0 | ggplot2 | 3.3.6 |\n| gh | 1.3.0 | gitcreds | 0.1.1 | glmnet | 4.1-4 |\n| globals | 0.16.0 | glue | 1.6.2 | googledrive | 2.0.0 |\n| googlesheets4 | 1.0.1 | gower | 1.0.0 | graphics | 4.1.3 |\n| grDevices | 4.1.3 | grid | 4.1.3 | gridExtra | 2.3 |\n| gsubfn | 0.7 | gtable | 0.3.0 | hardhat | 1.2.0 |\n| haven | 2.5.0 | highr | 0.9 | hms | 1.1.1 |\n| htmltools | 0.5.3 | htmlwidgets | 1.5.4 | httpuv | 1.6.5 |\n| httr | 1.4.3 | ids | 1.0.1 | ini | 0.3.1 |\n| ipred | 0.9-13 | isoband | 0.2.5 | iterators | 1.0.14 |\n| jquerylib | 0.1.4 | jsonlite | 1.8.0 | KernSmooth | 2.23-20 |\n| knitr | 1.39 | labeling | 0.4.2 | later | 1.3.0 |\n| lattice | 0.20-45 | lava | 1.6.10 | lifecycle | 1.0.1 |\n| listenv | 0.8.0 | lubridate | 1.8.0 | magrittr | 2.0.3 |\n| markdown | 1.1 | MASS | 7.3-56 | Matrix | 1.4-1 |\n| memoise | 2.0.1 | methods | 4.1.3 | mgcv | 1.8-40 |\n| mime | 0.12 | miniUI | 0.1.1.1 | ModelMetrics | 1.2.2.2 |\n| modelr | 0.1.8 | munsell | 0.5.0 | nlme | 3.1-157 |\n| nnet | 7.3-17 | numDeriv | 2016.8-1.1 | openssl | 2.0.2 |\n| parallel | 4.1.3 | parallelly | 1.32.1 | pillar | 1.8.0 |\n| pkgbuild | 1.3.1 | pkgconfig | 2.0.3 | pkgdown | 2.0.6 |\n| pkgload | 1.3.0 | plogr | 0.2.0 | plyr | 1.8.7 |\n| praise | 1.0.0 | prettyunits | 1.1.1 | pROC | 1.18.0 |\n| processx | 3.7.0 | prodlim | 2019.11.13 | profvis | 0.3.7 |\n| progress | 1.2.2 | progressr | 0.10.1 | promises | 1.2.0.1 |\n| proto | 1.0.0 | proxy | 0.4-27 | ps | 1.7.1 |\n| purrr | 0.3.4 | r2d3 | 0.2.6 | R6 | 2.5.1 |\n| ragg | 1.2.2 | randomForest | 4.7-1.1 | rappdirs | 0.3.3 |\n| rcmdcheck | 1.4.0 | RColorBrewer | 1.1-3 | Rcpp | 1.0.9 |\n| RcppEigen | 0.3.3.9.2 | readr | 2.1.2 | readxl | 1.4.0 |\n| recipes | 1.0.1 | rematch | 1.0.1 | rematch2 | 2.1.2 |\n| remotes | 2.4.2 | reprex | 2.0.1 | reshape2 | 1.4.4 |\n| rlang | 1.0.4 | rmarkdown | 2.14 | RODBC | 1.3-19 |\n| roxygen2 | 7.2.1 | rpart | 4.1.16 | rprojroot | 2.0.3 |\n| Rserve | 1.8-11 | RSQLite | 2.2.15 | rstudioapi | 0.13 |\n| rversions | 2.1.1 | rvest | 1.0.2 | sass | 0.4.2 |\n| scales | 1.2.0 | selectr | 0.4-2 | sessioninfo | 1.2.2 |\n| shape | 1.4.6 | shiny | 1.7.2 | sourcetools | 0.1.7 |\n| sparklyr | 1.7.7 | SparkR | 3.3.0 | spatial | 7.3-11 |\n| splines | 4.1.3 | sqldf | 0.4-11 | SQUAREM | 2021.1 |\n| stats | 4.1.3 | stats4 | 4.1.3 | stringi | 1.7.8 |\n| stringr | 1.4.0 | survival | 3.2-13 | sys | 3.4 |\n| systemfonts | 1.0.4 | tcltk | 4.1.3 | testthat | 3.1.4 |\n| textshaping | 0.3.6 | tibble | 3.1.8 | tidyr | 1.2.0 |\n| tidyselect | 1.1.2 | tidyverse | 1.3.2 | timeDate | 4021.104 |\n| tinytex | 0.40 | tools | 4.1.3 | tzdb | 0.3.0 |\n| urlchecker | 1.0.1 | usethis | 2.1.6 | utf8 | 1.2.2 |\n| utils | 4.1.3 | uuid | 1.1-0 | vctrs | 0.4.1 |\n| viridisLite | 0.4.0 | vroom | 1.5.7 | waldo | 0.4.0 |\n| whisker | 0.4 | withr | 2.5.0 | xfun | 0.32 |\n| xml2 | 1.3.3 | xopen | 1.0.0 | xtable | 1.8-4 |\n| yaml | 2.3.5 | zip | 2.2.0 | | | \n### Installed Java and Scala libraries (Scala 2.12 cluster version) \n| Group ID | Artifact ID | Version |\n| --- | --- | --- |\n| antlr | antlr | 2.7.7 |\n| com.amazonaws | amazon-kinesis-client | 1.12.0 |\n| com.amazonaws | aws-java-sdk-autoscaling | 1.12.189 |\n| com.amazonaws | aws-java-sdk-cloudformation | 1.12.189 |\n| com.amazonaws | aws-java-sdk-cloudfront | 1.12.189 |\n| com.amazonaws | aws-java-sdk-cloudhsm | 1.12.189 |\n| com.amazonaws | aws-java-sdk-cloudsearch | 1.12.189 |\n| com.amazonaws | aws-java-sdk-cloudtrail | 1.12.189 |\n| com.amazonaws | aws-java-sdk-cloudwatch | 1.12.189 |\n| com.amazonaws | aws-java-sdk-cloudwatchmetrics | 1.12.189 |\n| com.amazonaws | aws-java-sdk-codedeploy | 1.12.189 |\n| com.amazonaws | aws-java-sdk-cognitoidentity | 1.12.189 |\n| com.amazonaws | aws-java-sdk-cognitosync | 1.12.189 |\n| com.amazonaws | aws-java-sdk-config | 1.12.189 |\n| com.amazonaws | aws-java-sdk-core | 1.12.189 |\n| com.amazonaws | aws-java-sdk-datapipeline | 1.12.189 |\n| com.amazonaws | aws-java-sdk-directconnect | 1.12.189 |\n| com.amazonaws | aws-java-sdk-directory | 1.12.189 |\n| com.amazonaws | aws-java-sdk-dynamodb | 1.12.189 |\n| com.amazonaws | aws-java-sdk-ec2 | 1.12.189 |\n| com.amazonaws | aws-java-sdk-ecs | 1.12.189 |\n| com.amazonaws | aws-java-sdk-efs | 1.12.189 |\n| com.amazonaws | aws-java-sdk-elasticache | 1.12.189 |\n| com.amazonaws | aws-java-sdk-elasticbeanstalk | 1.12.189 |\n| com.amazonaws | aws-java-sdk-elasticloadbalancing | 1.12.189 |\n| com.amazonaws | aws-java-sdk-elastictranscoder | 1.12.189 |\n| com.amazonaws | aws-java-sdk-emr | 1.12.189 |\n| com.amazonaws | aws-java-sdk-glacier | 1.12.189 |\n| com.amazonaws | aws-java-sdk-glue | 1.12.189 |\n| com.amazonaws | aws-java-sdk-iam | 1.12.189 |\n| com.amazonaws | aws-java-sdk-importexport | 1.12.189 |\n| com.amazonaws | aws-java-sdk-kinesis | 1.12.189 |\n| com.amazonaws | aws-java-sdk-kms | 1.12.189 |\n| com.amazonaws | aws-java-sdk-lambda | 1.12.189 |\n| com.amazonaws | aws-java-sdk-logs | 1.12.189 |\n| com.amazonaws | aws-java-sdk-machinelearning | 1.12.189 |\n| com.amazonaws | aws-java-sdk-opsworks | 1.12.189 |\n| com.amazonaws | aws-java-sdk-rds | 1.12.189 |\n| com.amazonaws | aws-java-sdk-redshift | 1.12.189 |\n| com.amazonaws | aws-java-sdk-route53 | 1.12.189 |\n| com.amazonaws | aws-java-sdk-s3 | 1.12.189 |\n| com.amazonaws | aws-java-sdk-ses | 1.12.189 |\n| com.amazonaws | aws-java-sdk-simpledb | 1.12.189 |\n| com.amazonaws | aws-java-sdk-simpleworkflow | 1.12.189 |\n| com.amazonaws | aws-java-sdk-sns | 1.12.189 |\n| com.amazonaws | aws-java-sdk-sqs | 1.12.189 |\n| com.amazonaws | aws-java-sdk-ssm | 1.12.189 |\n| com.amazonaws | aws-java-sdk-storagegateway | 1.12.189 |\n| com.amazonaws | aws-java-sdk-sts | 1.12.189 |\n| com.amazonaws | aws-java-sdk-support | 1.12.189 |\n| com.amazonaws | aws-java-sdk-swf-libraries | 1.11.22 |\n| com.amazonaws | aws-java-sdk-workspaces | 1.12.189 |\n| com.amazonaws | jmespath-java | 1.12.189 |\n| com.chuusai | shapeless\\_2.12 | 2.3.3 |\n| com.clearspring.analytics | stream | 2.9.6 |\n| com.databricks | Rserve | 1.8-3 |\n| com.databricks | jets3t | 0.7.1-0 |\n| com.databricks.scalapb | compilerplugin\\_2.12 | 0.4.15-10 |\n| com.databricks.scalapb | scalapb-runtime\\_2.12 | 0.4.15-10 |\n| com.esotericsoftware | kryo-shaded | 4.0.2 |\n| com.esotericsoftware | minlog | 1.3.0 |\n| com.fasterxml | classmate | 1.3.4 |\n| com.fasterxml.jackson.core | jackson-annotations | 2.13.3 |\n| com.fasterxml.jackson.core | jackson-core | 2.13.3 |\n| com.fasterxml.jackson.core | jackson-databind | 2.13.3 |\n| com.fasterxml.jackson.dataformat | jackson-dataformat-cbor | 2.13.3 |\n| com.fasterxml.jackson.datatype | jackson-datatype-joda | 2.13.3 |\n| com.fasterxml.jackson.datatype | jackson-datatype-jsr310 | 2.13.3 |\n| com.fasterxml.jackson.module | jackson-module-paranamer | 2.13.3 |\n| com.fasterxml.jackson.module | jackson-module-scala\\_2.12 | 2.13.3 |\n| com.github.ben-manes.caffeine | caffeine | 2.3.4 |\n| com.github.fommil | jniloader | 1.1 |\n| com.github.fommil.netlib | core | 1.1.2 |\n| com.github.fommil.netlib | native\\_ref-java | 1.1 |\n| com.github.fommil.netlib | native\\_ref-java-natives | 1.1 |\n| com.github.fommil.netlib | native\\_system-java | 1.1 |\n| com.github.fommil.netlib | native\\_system-java-natives | 1.1 |\n| com.github.fommil.netlib | netlib-native\\_ref-linux-x86\\_64-natives | 1.1 |\n| com.github.fommil.netlib | netlib-native\\_system-linux-x86\\_64-natives | 1.1 |\n| com.github.luben | zstd-jni | 1.5.2-1 |\n| com.github.wendykierp | JTransforms | 3.1 |\n| com.google.code.findbugs | jsr305 | 3.0.0 |\n| com.google.code.gson | gson | 2.8.6 |\n| com.google.crypto.tink | tink | 1.6.1 |\n| com.google.flatbuffers | flatbuffers-java | 1.12.0 |\n| com.google.guava | guava | 15.0 |\n| com.google.protobuf | protobuf-java | 2.6.1 |\n| com.h2database | h2 | 2.0.204 |\n| com.helger | profiler | 1.1.1 |\n| com.jcraft | jsch | 0.1.50 |\n| com.jolbox | bonecp | 0.8.0.RELEASE |\n| com.lihaoyi | sourcecode\\_2.12 | 0.1.9 |\n| com.microsoft.azure | azure-data-lake-store-sdk | 2.3.9 |\n| com.ning | compress-lzf | 1.1 |\n| com.sun.mail | javax.mail | 1.5.2 |\n| com.tdunning | json | 1.8 |\n| com.thoughtworks.paranamer | paranamer | 2.8 |\n| com.trueaccord.lenses | lenses\\_2.12 | 0.4.12 |\n| com.twitter | chill-java | 0.10.0 |\n| com.twitter | chill\\_2.12 | 0.10.0 |\n| com.twitter | util-app\\_2.12 | 7.1.0 |\n| com.twitter | util-core\\_2.12 | 7.1.0 |\n| com.twitter | util-function\\_2.12 | 7.1.0 |\n| com.twitter | util-jvm\\_2.12 | 7.1.0 |\n| com.twitter | util-lint\\_2.12 | 7.1.0 |\n| com.twitter | util-registry\\_2.12 | 7.1.0 |\n| com.twitter | util-stats\\_2.12 | 7.1.0 |\n| com.typesafe | config | 1.2.1 |\n| com.typesafe.scala-logging | scala-logging\\_2.12 | 3.7.2 |\n| com.uber | h3 | 3.7.0 |\n| com.univocity | univocity-parsers | 2.9.1 |\n| com.zaxxer | HikariCP | 4.0.3 |\n| commons-cli | commons-cli | 1.5.0 |\n| commons-codec | commons-codec | 1.15 |\n| commons-collections | commons-collections | 3.2.2 |\n| commons-dbcp | commons-dbcp | 1.4 |\n| commons-fileupload | commons-fileupload | 1.3.3 |\n| commons-httpclient | commons-httpclient | 3.1 |\n| commons-io | commons-io | 2.11.0 |\n| commons-lang | commons-lang | 2.6 |\n| commons-logging | commons-logging | 1.1.3 |\n| commons-pool | commons-pool | 1.5.4 |\n| dev.ludovic.netlib | arpack | 2.2.1 |\n| dev.ludovic.netlib | blas | 2.2.1 |\n| dev.ludovic.netlib | lapack | 2.2.1 |\n| hadoop3 | jets3t-0.7 | liball\\_deps\\_2.12 |\n| info.ganglia.gmetric4j | gmetric4j | 1.0.10 |\n| io.airlift | aircompressor | 0.21 |\n| io.delta | delta-sharing-spark\\_2.12 | 0.5.0 |\n| io.dropwizard.metrics | metrics-core | 4.1.1 |\n| io.dropwizard.metrics | metrics-graphite | 4.1.1 |\n| io.dropwizard.metrics | metrics-healthchecks | 4.1.1 |\n| io.dropwizard.metrics | metrics-jetty9 | 4.1.1 |\n| io.dropwizard.metrics | metrics-jmx | 4.1.1 |\n| io.dropwizard.metrics | metrics-json | 4.1.1 |\n| io.dropwizard.metrics | metrics-jvm | 4.1.1 |\n| io.dropwizard.metrics | metrics-servlets | 4.1.1 |\n| io.netty | netty-all | 4.1.74.Final |\n| io.netty | netty-buffer | 4.1.74.Final |\n| io.netty | netty-codec | 4.1.74.Final |\n| io.netty | netty-common | 4.1.74.Final |\n| io.netty | netty-handler | 4.1.74.Final |\n| io.netty | netty-resolver | 4.1.74.Final |\n| io.netty | netty-tcnative-classes | 2.0.48.Final |\n| io.netty | netty-transport | 4.1.74.Final |\n| io.netty | netty-transport-classes-epoll | 4.1.74.Final |\n| io.netty | netty-transport-classes-kqueue | 4.1.74.Final |\n| io.netty | netty-transport-native-epoll-linux-aarch\\_64 | 4.1.74.Final |\n| io.netty | netty-transport-native-epoll-linux-x86\\_64 | 4.1.74.Final |\n| io.netty | netty-transport-native-kqueue-osx-aarch\\_64 | 4.1.74.Final |\n| io.netty | netty-transport-native-kqueue-osx-x86\\_64 | 4.1.74.Final |\n| io.netty | netty-transport-native-unix-common | 4.1.74.Final |\n| io.prometheus | simpleclient | 0.7.0 |\n| io.prometheus | simpleclient\\_common | 0.7.0 |\n| io.prometheus | simpleclient\\_dropwizard | 0.7.0 |\n| io.prometheus | simpleclient\\_pushgateway | 0.7.0 |\n| io.prometheus | simpleclient\\_servlet | 0.7.0 |\n| io.prometheus.jmx | collector | 0.12.0 |\n| jakarta.annotation | jakarta.annotation-api | 1.3.5 |\n| jakarta.servlet | jakarta.servlet-api | 4.0.3 |\n| jakarta.validation | jakarta.validation-api | 2.0.2 |\n| jakarta.ws.rs | jakarta.ws.rs-api | 2.1.6 |\n| javax.activation | activation | 1.1.1 |\n| javax.annotation | javax.annotation-api | 1.3.2 |\n| javax.el | javax.el-api | 2.2.4 |\n| javax.jdo | jdo-api | 3.0.1 |\n| javax.transaction | jta | 1.1 |\n| javax.transaction | transaction-api | 1.1 |\n| javax.xml.bind | jaxb-api | 2.2.11 |\n| javolution | javolution | 5.5.1 |\n| jline | jline | 2.14.6 |\n| joda-time | joda-time | 2.10.13 |\n| mvn | hadoop3 | liball\\_deps\\_2.12 |\n| net.java.dev.jna | jna | 5.8.0 |\n| net.razorvine | pickle | 1.2 |\n| net.sf.jpam | jpam | 1.1 |\n| net.sf.opencsv | opencsv | 2.3 |\n| net.sf.supercsv | super-csv | 2.2.0 |\n| net.snowflake | snowflake-ingest-sdk | 0.9.6 |\n| net.snowflake | snowflake-jdbc | 3.13.14 |\n| net.snowflake | spark-snowflake\\_2.12 | 2.10.0-spark\\_3.2 |\n| net.sourceforge.f2j | arpack\\_combined\\_all | 0.1 |\n| org.acplt.remotetea | remotetea-oncrpc | 1.1.2 |\n| org.antlr | ST4 | 4.0.4 |\n| org.antlr | antlr-runtime | 3.5.2 |\n| org.antlr | antlr4-runtime | 4.8 |\n| org.antlr | stringtemplate | 3.2.1 |\n| org.apache.ant | ant | 1.9.2 |\n| org.apache.ant | ant-jsch | 1.9.2 |\n| org.apache.ant | ant-launcher | 1.9.2 |\n| org.apache.arrow | arrow-format | 7.0.0 |\n| org.apache.arrow | arrow-memory-core | 7.0.0 |\n| org.apache.arrow | arrow-memory-netty | 7.0.0 |\n| org.apache.arrow | arrow-vector | 7.0.0 |\n| org.apache.avro | avro | 1.11.0 |\n| org.apache.avro | avro-ipc | 1.11.0 |\n| org.apache.avro | avro-mapred | 1.11.0 |\n| org.apache.commons | commons-collections4 | 4.4 |\n| org.apache.commons | commons-compress | 1.21 |\n| org.apache.commons | commons-crypto | 1.1.0 |\n| org.apache.commons | commons-lang3 | 3.12.0 |\n| org.apache.commons | commons-math3 | 3.6.1 |\n| org.apache.commons | commons-text | 1.9 |\n| org.apache.curator | curator-client | 2.13.0 |\n| org.apache.curator | curator-framework | 2.13.0 |\n| org.apache.curator | curator-recipes | 2.13.0 |\n| org.apache.derby | derby | 10.14.2.0 |\n| org.apache.hadoop | hadoop-client-api | 3.3.2-databricks |\n| org.apache.hadoop | hadoop-client-runtime | 3.3.2 |\n| org.apache.hive | hive-beeline | 2.3.9 |\n| org.apache.hive | hive-cli | 2.3.9 |\n| org.apache.hive | hive-jdbc | 2.3.9 |\n| org.apache.hive | hive-llap-client | 2.3.9 |\n| org.apache.hive | hive-llap-common | 2.3.9 |\n| org.apache.hive | hive-serde | 2.3.9 |\n| org.apache.hive | hive-shims | 2.3.9 |\n| org.apache.hive | hive-storage-api | 2.7.2 |\n| org.apache.hive.shims | hive-shims-0.23 | 2.3.9 |\n| org.apache.hive.shims | hive-shims-common | 2.3.9 |\n| org.apache.hive.shims | hive-shims-scheduler | 2.3.9 |\n| org.apache.httpcomponents | httpclient | 4.5.13 |\n| org.apache.httpcomponents | httpcore | 4.4.14 |\n| org.apache.ivy | ivy | 2.5.0 |\n| org.apache.logging.log4j | log4j-1.2-api | 2.17.2 |\n| org.apache.logging.log4j | log4j-api | 2.17.2 |\n| org.apache.logging.log4j | log4j-core | 2.17.2 |\n| org.apache.logging.log4j | log4j-slf4j-impl | 2.17.2 |\n| org.apache.mesos | mesos-shaded-protobuf | 1.4.0 |\n| org.apache.orc | orc-core | 1.7.5 |\n| org.apache.orc | orc-mapreduce | 1.7.5 |\n| org.apache.orc | orc-shims | 1.7.5 |\n| org.apache.parquet | parquet-column | 1.12.0-databricks-0004 |\n| org.apache.parquet | parquet-common | 1.12.0-databricks-0004 |\n| org.apache.parquet | parquet-encoding | 1.12.0-databricks-0004 |\n| org.apache.parquet | parquet-format-structures | 1.12.0-databricks-0004 |\n| org.apache.parquet | parquet-hadoop | 1.12.0-databricks-0004 |\n| org.apache.parquet | parquet-jackson | 1.12.0-databricks-0004 |\n| org.apache.thrift | libfb303 | 0.9.3 |\n| org.apache.thrift | libthrift | 0.12.0 |\n| org.apache.xbean | xbean-asm9-shaded | 4.20 |\n| org.apache.yetus | audience-annotations | 0.5.0 |\n| org.apache.zookeeper | zookeeper | 3.6.2 |\n| org.apache.zookeeper | zookeeper-jute | 3.6.2 |\n| org.checkerframework | checker-qual | 3.5.0 |\n| org.codehaus.jackson | jackson-core-asl | 1.9.13 |\n| org.codehaus.jackson | jackson-mapper-asl | 1.9.13 |\n| org.codehaus.janino | commons-compiler | 3.0.16 |\n| org.codehaus.janino | janino | 3.0.16 |\n| org.datanucleus | datanucleus-api-jdo | 4.2.4 |\n| org.datanucleus | datanucleus-core | 4.1.17 |\n| org.datanucleus | datanucleus-rdbms | 4.1.19 |\n| org.datanucleus | javax.jdo | 3.2.0-m3 |\n| org.eclipse.jetty | jetty-client | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-continuation | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-http | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-io | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-jndi | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-plus | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-proxy | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-security | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-server | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-servlet | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-servlets | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-util | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-util-ajax | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-webapp | 9.4.46.v20220331 |\n| org.eclipse.jetty | jetty-xml | 9.4.46.v20220331 |\n| org.eclipse.jetty.websocket | websocket-api | 9.4.46.v20220331 |\n| org.eclipse.jetty.websocket | websocket-client | 9.4.46.v20220331 |\n| org.eclipse.jetty.websocket | websocket-common | 9.4.46.v20220331 |\n| org.eclipse.jetty.websocket | websocket-server | 9.4.46.v20220331 |\n| org.eclipse.jetty.websocket | websocket-servlet | 9.4.46.v20220331 |\n| org.fusesource.leveldbjni | leveldbjni-all | 1.8 |\n| org.glassfish.hk2 | hk2-api | 2.6.1 |\n| org.glassfish.hk2 | hk2-locator | 2.6.1 |\n| org.glassfish.hk2 | hk2-utils | 2.6.1 |\n| org.glassfish.hk2 | osgi-resource-locator | 1.0.3 |\n| org.glassfish.hk2.external | aopalliance-repackaged | 2.6.1 |\n| org.glassfish.hk2.external | jakarta.inject | 2.6.1 |\n| org.glassfish.jersey.containers | jersey-container-servlet | 2.34 |\n| org.glassfish.jersey.containers | jersey-container-servlet-core | 2.34 |\n| org.glassfish.jersey.core | jersey-client | 2.34 |\n| org.glassfish.jersey.core | jersey-common | 2.34 |\n| org.glassfish.jersey.core | jersey-server | 2.34 |\n| org.glassfish.jersey.inject | jersey-hk2 | 2.34 |\n| org.hibernate.validator | hibernate-validator | 6.1.0.Final |\n| org.javassist | javassist | 3.25.0-GA |\n| org.jboss.logging | jboss-logging | 3.3.2.Final |\n| org.jdbi | jdbi | 2.63.1 |\n| org.jetbrains | annotations | 17.0.0 |\n| org.joda | joda-convert | 1.7 |\n| org.jodd | jodd-core | 3.5.2 |\n| org.json4s | json4s-ast\\_2.12 | 3.7.0-M11 |\n| org.json4s | json4s-core\\_2.12 | 3.7.0-M11 |\n| org.json4s | json4s-jackson\\_2.12 | 3.7.0-M11 |\n| org.json4s | json4s-scalap\\_2.12 | 3.7.0-M11 |\n| org.lz4 | lz4-java | 1.8.0 |\n| org.mariadb.jdbc | mariadb-java-client | 2.7.4 |\n| org.mlflow | mlflow-spark | 1.27.0 |\n| org.objenesis | objenesis | 2.5.1 |\n| org.postgresql | postgresql | 42.3.3 |\n| org.roaringbitmap | RoaringBitmap | 0.9.25 |\n| org.roaringbitmap | shims | 0.9.25 |\n| org.rocksdb | rocksdbjni | 6.24.2 |\n| org.rosuda.REngine | REngine | 2.1.0 |\n| org.scala-lang | scala-compiler\\_2.12 | 2.12.14 |\n| org.scala-lang | scala-library\\_2.12 | 2.12.14 |\n| org.scala-lang | scala-reflect\\_2.12 | 2.12.14 |\n| org.scala-lang.modules | scala-collection-compat\\_2.12 | 2.4.3 |\n| org.scala-lang.modules | scala-parser-combinators\\_2.12 | 1.1.2 |\n| org.scala-lang.modules | scala-xml\\_2.12 | 1.2.0 |\n| org.scala-sbt | test-interface | 1.0 |\n| org.scalacheck | scalacheck\\_2.12 | 1.14.2 |\n| org.scalactic | scalactic\\_2.12 | 3.0.8 |\n| org.scalanlp | breeze-macros\\_2.12 | 1.2 |\n| org.scalanlp | breeze\\_2.12 | 1.2 |\n| org.scalatest | scalatest\\_2.12 | 3.0.8 |\n| org.slf4j | jcl-over-slf4j | 1.7.36 |\n| org.slf4j | jul-to-slf4j | 1.7.36 |\n| org.slf4j | slf4j-api | 1.7.36 |\n| org.spark-project.spark | unused | 1.0.0 |\n| org.threeten | threeten-extra | 1.5.0 |\n| org.tukaani | xz | 1.8 |\n| org.typelevel | algebra\\_2.12 | 2.0.1 |\n| org.typelevel | cats-kernel\\_2.12 | 2.1.1 |\n| org.typelevel | macro-compat\\_2.12 | 1.1.1 |\n| org.typelevel | spire-macros\\_2.12 | 0.17.0 |\n| org.typelevel | spire-platform\\_2.12 | 0.17.0 |\n| org.typelevel | spire-util\\_2.12 | 0.17.0 |\n| org.typelevel | spire\\_2.12 | 0.17.0 |\n| org.wildfly.openssl | wildfly-openssl | 1.0.7.Final |\n| org.xerial | sqlite-jdbc | 3.8.11.2 |\n| org.xerial.snappy | snappy-java | 1.1.8.4 |\n| org.yaml | snakeyaml | 1.24 |\n| oro | oro | 2.0.8 |\n| pl.edu.icm | JLargeArrays | 1.5 |\n| software.amazon.ion | ion-java | 1.0.2 |\n| stax | stax-api | 1.0.1 |\n\n", "chunk_id": "3983d278404f4b569c038f0b4e2acb19", "url": "https://docs.databricks.com/archive/runtime-release-notes/11.1.html"} +{"chunked_text": "# Databricks data engineering\n## What are init scripts?\n#### Use cluster-scoped init scripts\n\nCluster-scoped init scripts are init scripts defined in a cluster configuration. Cluster-scoped init scripts apply to both clusters you create and those created to run jobs. \nYou can configure cluster-scoped init scripts using the UI, the CLI, and by invoking the Clusters API. This section focuses on performing these tasks using the UI. For the other methods, see the [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) and the [Clusters API](https://docs.databricks.com/api/workspace/clusters). \nYou can add any number of scripts, and the scripts are executed sequentially in the order provided. \nIf a cluster-scoped init script returns a non-zero exit code, the cluster launch *fails*. You can troubleshoot cluster-scoped init scripts by configuring [cluster log delivery](https://docs.databricks.com/compute/configure.html#cluster-log-delivery) and examining the init script log. See [Init script logging](https://docs.databricks.com/init-scripts/logs.html).\n\n", "chunk_id": "b1a25723931d4925624676e76dfcc36a", "url": "https://docs.databricks.com/init-scripts/cluster-scoped.html"} +{"chunked_text": "# Databricks data engineering\n## What are init scripts?\n#### Use cluster-scoped init scripts\n##### Configure a cluster-scoped init script using the UI\n\nThis section contains instructions for configuring a cluster to run an init script using the Databricks UI. \nDatabricks recommends managing all init scripts as cluster-scoped init scripts. If you are using compute with shared or single user access mode, store init scripts in Unity Catalog volumes. If you are using compute with no-isolation shared access mode, use workspace files for init scripts. \nFor shared access mode, you must add init scripts to the `allowlist`. See [Allowlist libraries and init scripts on shared compute](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/allowlist.html). \nTo use the UI to configure a cluster to run an init script, complete the following steps: \n1. On the cluster configuration page, click the **Advanced Options** toggle.\n2. At the bottom of the page, click the **Init Scripts** tab.\n3. In the **Source** drop-down, select the **Workspace**, **Volume**, or **S3** source type.\n4. Specify a path to the init script, such as one of the following examples: \n* For an init script stored in your home directory with workspace files: `/Users//.sh`.\n* For an init script stored with Unity Catalog volumes: `/Volumes/////.sh`.\n* For an init script stored with object storage: `s3://bucket-name/path/to/init-script`.\n5. Click **Add**. \nIn single user access mode, the identity of the assigned principal (a user or service principal) is used. \nIn shared access mode, the identity of the cluster owner is used. \nNote \nNo-isolation shared access mode does not support volumes, but uses the same identity assignment as shared access mode. \nTo remove a script from the cluster configuration, click the trash icon at the right of the script. When you confirm the delete you will be prompted to restart the cluster. Optionally you can delete the script file from the location you uploaded it to. \nNote \nIf you configure an init script using the **S3** source type, you must configure access credentials. \nDatabricks recommends using instance profiles to manage access to init scripts stored in S3. Use the following documentation in the cross-reference link to complete this setup: \n1. Create a IAM role with read and list permissions on your desired buckets. See [Tutorial: Configure S3 access with an instance profile](https://docs.databricks.com/connect/storage/tutorial-s3-instance-profile.html).\n2. Launch a cluster with the instance profile. See [Instance profiles](https://docs.databricks.com/compute/configure.html#instance-profiles). \nWarning \nCluster-scoped init scripts on DBFS are end-of-life. The **DBFS** option in the UI exists in some workspaces to support legacy workloads and is not recommended. All init scripts stored in DBFS should be migrated. For migration instructions, see [Migrate init scripts from DBFS](https://docs.databricks.com/init-scripts/index.html#migrate).\n\n", "chunk_id": "3e30baa0493c72e2b6b71ea167107ee4", "url": "https://docs.databricks.com/init-scripts/cluster-scoped.html"} +{"chunked_text": "# Databricks data engineering\n## What are init scripts?\n#### Use cluster-scoped init scripts\n##### Configure S3 region\n\nYou must specify the S3 region for the bucket containing the init script if the bucket is in a different region than your workspace. Select `auto` only if your bucket and workspace share a region.\n\n#### Use cluster-scoped init scripts\n##### Troubleshooting cluster-scoped init scripts\n\n* The script must exist at the configured location. If the script doesn\u2019t exist, attempts to start the cluster or scale up the executors result in failure.\n* The init script cannot be larger than 64KB. If a script exceeds that size, the cluster will fail to launch and a failure message will appear in the cluster log.\n\n", "chunk_id": "379b5dd05684f9d65587d7b30c39265f", "url": "https://docs.databricks.com/init-scripts/cluster-scoped.html"} +{"chunked_text": "# Develop on Databricks\n", "chunk_id": "75b81a85abab27cd8f51cc304141cc27", "url": "https://docs.databricks.com/dev-tools/index.html"} +{"chunked_text": "# Develop on Databricks\n### Developer tools and guidance\n\nLearn about tools and guidance you can use to work with Databricks resources and data and to develop Databricks applications. \n| Section | Use this section when you want to\u2026 |\n| --- | --- |\n| [Authentication](https://docs.databricks.com/dev-tools/auth/index.html) | Authenticate with Databricks from your tools, scripts, and apps. You must authenticate with Databricks before you can work with Databricks resources and data. |\n| **IDEs** | Connect to Databricks using [Databricks Connect](https://docs.databricks.com/dev-tools/databricks-connect/index.html) with popular integrated development environments (IDEs) such as [Visual Studio Code](https://docs.databricks.com/dev-tools/visual-studio-code.html), [PyCharm](https://docs.databricks.com/dev-tools/pycharm.html), [IntelliJ IDEA](https://docs.databricks.com/dev-tools/intellij-idea.html), [Eclipse](https://docs.databricks.com/dev-tools/eclipse.html), [RStudio](https://docs.databricks.com/dev-tools/rstudio.html), and [JupyterLab](https://docs.databricks.com/dev-tools/databricks-connect/python/jupyterlab.html), as well as Databricks IDE plugins. |\n| [SDKs](https://docs.databricks.com/dev-tools/index-sdk.html) | Automate Databricks from code libraries written for popular languages such as Python, Java, Go, and R. |\n| [SQL connectors/drivers](https://docs.databricks.com/dev-tools/index-driver.html) | Run SQL commands on Databricks from code written in popular languages such as Python, Go, JavaScript, and TypeScript. Connect tools and clients to Databricks through ODBC and JDBC connections. |\n| [SQL tools](https://docs.databricks.com/dev-tools/index-sql.html) | Run SQL commands and scripts in Databricks by using the [Databricks SQL CLI](https://docs.databricks.com/dev-tools/databricks-sql-cli.html), the [Databricks Driver for SQLTools](https://docs.databricks.com/dev-tools/sqltools-driver.html), and popular tools such as [DataGrip](https://docs.databricks.com/dev-tools/datagrip.html), [DBeaver](https://docs.databricks.com/dev-tools/dbeaver.html), and [SQL Workbench/J](https://docs.databricks.com/partners/bi/workbenchj.html). |\n| [Databricks CLI](https://docs.databricks.com/dev-tools/cli/index.html) | Access Databricks functionality using the Databricks command-line interface (CLI). |\n| [Utilities](https://docs.databricks.com/dev-tools/databricks-utils.html) | Use Databricks Utilities from within notebooks to do things such as work with object storage efficiently, chain and parameterize notebooks, and work with sensitive credential information. |\n| [IaC](https://docs.databricks.com/dev-tools/index-iac.html) | Automate the provision and maintenance of Databricks infrastructure and resources by using popular infrastructure-as-code (IaC) products such as Terraform, the Cloud Development Kit for Terraform, and Pulumi. |\n| [CI/CD](https://docs.databricks.com/dev-tools/index-ci-cd.html) | Implement industry-standard continuous integration and continuous delivery (CI/CD) practices for Databricks by using [Databricks Asset Bundles](https://docs.databricks.com/dev-tools/bundles/index.html), and popular systems and frameworks such as [GitHub Actions](https://docs.databricks.com/dev-tools/ci-cd/ci-cd-github.html), DevOps pipelines, [Jenkins](https://docs.databricks.com/dev-tools/ci-cd/ci-cd-jenkins.html), and [Apache Airflow](https://docs.databricks.com/workflows/jobs/how-to/use-airflow-with-jobs.html). | \nTip \nYou can also connect many additional popular third-party tools to clusters and SQL warehouses to access data in Databricks. See the [Technology partners](https://docs.databricks.com/integrations/index.html).\n\n", "chunk_id": "7baa8d639f170e8a1a35c8d20f14fb94", "url": "https://docs.databricks.com/dev-tools/index.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 10.5 for Machine Learning (unsupported)\n\nDatabricks Runtime 10.5 for Machine Learning provides a ready-to-go environment for machine learning and data science based on [Databricks Runtime 10.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/10.5.html). Databricks Runtime ML contains many popular machine learning libraries, including TensorFlow, PyTorch, and XGBoost. Databricks Runtime ML includes [AutoML](https://docs.databricks.com/machine-learning/automl/index.html), a tool to automatically train machine learning pipelines. Databricks Runtime ML also supports distributed deep learning training using Horovod. \nFor more information, including instructions for creating a Databricks Runtime ML cluster, see [AI and Machine Learning on Databricks](https://docs.databricks.com/machine-learning/index.html).\n\n", "chunk_id": "78aefd8651cd7eff8a4c6987fa5a3af8", "url": "https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 10.5 for Machine Learning (unsupported)\n##### New features and improvements\n\nDatabricks Runtime 10.5 ML is built on top of Databricks Runtime 10.5. For information on what\u2019s new in Databricks Runtime 10.5, including Apache Spark MLlib and SparkR, see the [Databricks Runtime 10.5 (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/10.5.html) release notes. \n### Enhancements to Databricks AutoML \nThe following enhancements have been made to [Databricks AutoML](https://docs.databricks.com/machine-learning/automl/index.html). \n* Improved memory usage allows AutoML to train on larger datasets.\n* With AutoML forecasting, you can now export the best model\u2019s predictions to a table using the API. If `output_database` is provided, AutoML saves predictions of the best model to a new table in the specified database. The predictions are not saved if `output_database` is not specified. \n### Enhancements to Databricks Feature Store \nThe following enhancements have been made to [Databricks Feature Store](https://docs.databricks.com/machine-learning/feature-store/index.html). \n* You can now delete an existing feature table with the `drop_table` API. This action also drops the underlying Delta table.\n* You can now use the [Python API](https://docs.databricks.com/machine-learning/feature-store/python-api.html) to add a tag to a feature table when you create or register it, and to add, update, delete, or read tags on existing feature tables. \n* The Feature Store client now supports publishing to a DynamoDB online store without explicitly passing in secrets. Instead, you can use the attached instance profile from the running Databricks cluster. For instructions, see [Publish features to an online store](https://docs.databricks.com/machine-learning/feature-store/publish-features.html). For API details, see [Python API](https://docs.databricks.com/machine-learning/feature-store/python-api.html).\n\n", "chunk_id": "6e1bc6c698df6ba20b74a42caa9a3bc4", "url": "https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 10.5 for Machine Learning (unsupported)\n##### System environment\n\nThe system environment in Databricks Runtime 10.5 ML differs from Databricks Runtime 10.5 as follows: \n* **DBUtils**: Databricks Runtime ML does not include [Library utility (dbutils.library) (legacy)](https://docs.databricks.com/archive/dev-tools/dbutils-library.html).\nUse `%pip` commands instead. See [Notebook-scoped Python libraries](https://docs.databricks.com/libraries/notebooks-python-libraries.html).\n* For GPU clusters, Databricks Runtime ML includes the following NVIDIA GPU libraries: \n+ CUDA 11.0\n+ cuDNN 8.0.5.39\n+ NCCL 2.10.3\n+ TensorRT 7.2.2\n\n", "chunk_id": "37d50c7c709f0b924d32f202ff22058a", "url": "https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html"} +{"chunked_text": "# Databricks documentation archive\n## Unsupported Databricks Runtime release notes\n#### Databricks Runtime 10.5 for Machine Learning (unsupported)\n##### Libraries\n\nThe following sections list the libraries included in Databricks Runtime 10.5 ML that differ from those\nincluded in Databricks Runtime 10.5. \nIn this section: \n* [Top-tier libraries](https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html#top-tier-libraries)\n* [Python libraries](https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html#python-libraries)\n* [R libraries](https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html#r-libraries)\n* [Java and Scala libraries (Scala 2.12 cluster)](https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html#java-and-scala-libraries-scala-212-cluster) \n### [Top-tier libraries](https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html#id1) \nDatabricks Runtime 10.5 ML includes the following top-tier [libraries](https://docs.databricks.com/machine-learning/index.html): \n* [GraphFrames](https://docs.databricks.com/integrations/graphframes/index.html)\n* [Horovod and HorovodRunner](https://docs.databricks.com/machine-learning/train-model/distributed-training/index.html)\n* [MLflow](https://docs.databricks.com/mlflow/index.html)\n* [PyTorch](https://docs.databricks.com/machine-learning/train-model/pytorch.html)\n* [spark-tensorflow-connector](https://docs.databricks.com/machine-learning/load-data/tfrecords-save-load.html#df-to-tfrecord)\n* [TensorFlow](https://docs.databricks.com/machine-learning/train-model/tensorflow.html)\n* [TensorBoard](https://docs.databricks.com/machine-learning/train-model/tensorboard.html) \n### [Python libraries](https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html#id2) \nDatabricks Runtime 10.5 ML uses Virtualenv for Python package management and includes many popular ML packages. \nIn addition to the packages specified in the in the following sections, Databricks Runtime 10.5 ML also includes the following packages: \n* hyperopt 0.2.7.db1\n* sparkdl 2.2.0-db6\n* feature\\_store 0.4.1\n* automl 1.8.0 \n#### Python libraries on CPU clusters \n| Library | Version | Library | Version | Library | Version |\n| --- | --- | --- | --- | --- | --- |\n| absl-py | 0.11.0 | Antergos Linux | 2015.10 (ISO-Rolling) | appdirs | 1.4.4 |\n| argon2-cffi | 20.1.0 | astor | 0.8.1 | astunparse | 1.6.3 |\n| async-generator | 1.10 | attrs | 20.3.0 | backcall | 0.2.0 |\n| bcrypt | 3.2.0 | bidict | 0.21.4 | bleach | 3.3.0 |\n| blis | 0.7.7 | boto3 | 1.16.7 | botocore | 1.19.7 |\n| cachetools | 4.2.4 | catalogue | 2.0.7 | certifi | 2020.12.5 |\n| cffi | 1.14.5 | chardet | 4.0.0 | click | 7.1.2 |\n| cloudpickle | 1.6.0 | cmdstanpy | 0.9.68 | configparser | 5.0.1 |\n| convertdate | 2.4.0 | cryptography | 3.4.7 | cycler | 0.10.0 |\n| cymem | 2.0.6 | Cython | 0.29.23 | databricks-automl-runtime | 0.2.7 |\n| databricks-cli | 0.16.4 | dbl-tempo | 0.1.2 | dbus-python | 1.2.16 |\n| decorator | 5.0.6 | defusedxml | 0.7.1 | dill | 0.3.2 |\n| diskcache | 5.4.0 | distlib | 0.3.4 | distro-info | 0.23ubuntu1 |\n| entrypoints | 0.3 | ephem | 4.1.3 | facets-overview | 1.0.0 |\n| fasttext | 0.9.2 | filelock | 3.0.12 | Flask | 1.1.2 |\n| flatbuffers | 2.0 | fsspec | 0.9.0 | future | 0.18.2 |\n| gast | 0.4.0 | gitdb | 4.0.9 | GitPython | 3.1.12 |\n| google-auth | 1.22.1 | google-auth-oauthlib | 0.4.2 | google-pasta | 0.2.0 |\n| grpcio | 1.39.0 | gunicorn | 20.0.4 | gviz-api | 1.10.0 |\n| h5py | 3.1.0 | hijri-converter | 2.2.3 | holidays | 0.13 |\n| horovod | 0.23.0 | htmlmin | 0.1.12 | huggingface-hub | 0.5.1 |\n| idna | 2.10 | ImageHash | 4.2.1 | imbalanced-learn | 0.8.1 |\n| importlib-metadata | 3.10.0 | ipykernel | 5.3.4 | ipython | 7.22.0 |\n| ipython-genutils | 0.2.0 | ipywidgets | 7.6.3 | isodate | 0.6.0 |\n| itsdangerous | 1.1.0 | jedi | 0.17.2 | Jinja2 | 2.11.3 |\n| jmespath | 0.10.0 | joblib | 1.0.1 | joblibspark | 0.3.0 |\n| jsonschema | 3.2.0 | jupyter-client | 6.1.12 | jupyter-core | 4.7.1 |\n| jupyterlab-pygments | 0.1.2 | jupyterlab-widgets | 1.0.0 | keras | 2.8.0 |\n| Keras-Preprocessing | 1.1.2 | kiwisolver | 1.3.1 | koalas | 1.8.2 |\n| korean-lunar-calendar | 0.2.1 | langcodes | 3.3.0 | libclang | 13.0.0 |\n| lightgbm | 3.3.2 | llvmlite | 0.38.0 | LunarCalendar | 0.0.9 |\n| Mako | 1.1.3 | Markdown | 3.3.3 | MarkupSafe | 2.0.1 |\n| matplotlib | 3.4.2 | missingno | 0.5.1 | mistune | 0.8.4 |\n| mleap | 0.18.1 | mlflow-skinny | 1.24.0 | multimethod | 1.8 |\n| murmurhash | 1.0.6 | nbclient | 0.5.3 | nbconvert | 6.0.7 |\n| nbformat | 5.1.3 | nest-asyncio | 1.5.1 | networkx | 2.5 |\n| nltk | 3.6.1 | notebook | 6.3.0 | numba | 0.55.1 |\n| numpy | 1.20.1 | oauthlib | 3.1.0 | opt-einsum | 3.3.0 |\n| packaging | 21.3 | pandas | 1.2.4 | pandas-profiling | 3.1.0 |\n| pandocfilters | 1.4.3 | paramiko | 2.7.2 | parso | 0.7.0 |\n| pathy | 0.6.1 | patsy | 0.5.1 | petastorm | 0.11.4 |\n| pexpect | 4.8.0 | phik | 0.12.2 | pickleshare | 0.7.5 |\n| Pillow | 8.2.0 | pip | 21.0.1 | plotly | 5.6.0 |\n| pmdarima | 1.8.5 | preshed | 3.0.6 | prometheus-client | 0.10.1 |\n| prompt-toolkit | 3.0.17 | prophet | 1.0.1 | protobuf | 3.17.2 |\n| psutil | 5.8.0 | psycopg2 | 2.8.5 | ptyprocess | 0.7.0 |\n| pyarrow | 4.0.0 | pyasn1 | 0.4.8 | pyasn1-modules | 0.2.8 |\n| pybind11 | 2.9.2 | pycparser | 2.20 | pydantic | 1.8.2 |\n| Pygments | 2.8.1 | PyGObject | 3.36.0 | PyMeeus | 0.5.11 |\n| PyNaCl | 1.5.0 | pyodbc | 4.0.30 | pyparsing | 2.4.7 |\n| pyrsistent | 0.17.3 | pystan | 2.19.1.1 | python-apt | 2.0.0+ubuntu0.20.4.7 |\n| python-dateutil | 2.8.1 | python-editor | 1.0.4 | python-engineio | 4.3.0 |\n| python-socketio | 5.4.1 | pytz | 2020.5 | PyWavelets | 1.1.1 |\n| PyYAML | 5.4.1 | pyzmq | 20.0.0 | regex | 2021.4.4 |\n| requests | 2.25.1 | requests-oauthlib | 1.3.0 | requests-unixsocket | 0.2.0 |\n| rsa | 4.8 | s3transfer | 0.3.7 | sacremoses | 0.0.49 |\n| scikit-learn | 0.24.1 | scipy | 1.6.2 | seaborn | 0.11.1 |\n| Send2Trash | 1.5.0 | setuptools | 52.0.0 | setuptools-git | 1.2 |\n| shap | 0.40.0 | simplejson | 3.17.2 | six | 1.15.0 |\n| slicer | 0.0.7 | smart-open | 5.2.1 | smmap | 3.0.5 |\n| spacy | 3.2.3 | spacy-legacy | 3.0.9 | spacy-loggers | 1.0.2 |\n| spark-tensorflow-distributor | 1.0.0 | sqlparse | 0.4.1 | srsly | 2.4.3 |\n| ssh-import-id | 5.10 | statsmodels | 0.12.2 | tabulate | 0.8.7 |\n| tangled-up-in-unicode | 0.1.0 | tenacity | 6.2.0 | tensorboard | 2.8.0 |\n| tensorboard-data-server | 0.6.1 | tensorboard-plugin-profile | 2.5.0 | tensorboard-plugin-wit | 1.8.1 |\n| tensorflow-cpu | 2.8.0 | tensorflow-estimator | 2.8.0 | tensorflow-io-gcs-filesystem | 0.24.0 |\n| termcolor | 1.1.0 | terminado | 0.9.4 | testpath | 0.4.4 |\n| tf-estimator-nightly | 2.8.0.dev2021122109 | thinc | 8.0.15 | threadpoolctl | 2.1.0 |\n| tokenizers | 0.12.1 | torch | 1.10.2+cpu | torchvision | 0.11.3+cpu |\n| tornado | 6.1 | tqdm | 4.59.0 | traitlets | 5.0.5 |\n| transformers | 4.17.0 | typer | 0.4.1 | typing-extensions | 3.7.4.3 |\n| ujson | 4.0.2 | unattended-upgrades | 0.1 | urllib3 | 1.25.11 |\n| virtualenv | 20.4.1 | visions | 0.7.4 | wasabi | 0.9.1 |\n| wcwidth | 0.2.5 | webencodings | 0.5.1 | websocket-client | 0.57.0 |\n| Werkzeug | 1.0.1 | wheel | 0.36.2 | widgetsnbextension | 3.5.1 |\n| wrapt | 1.12.1 | xgboost | 1.5.2 | zipp | 3.4.1 | \n#### Python libraries on GPU clusters \n| Library | Version | Library | Version | Library | Version |\n| --- | --- | --- | --- | --- | --- |\n| absl-py | 0.11.0 | Antergos Linux | 2015.10 (ISO-Rolling) | appdirs | 1.4.4 |\n| argon2-cffi | 20.1.0 | astor | 0.8.1 | astunparse | 1.6.3 |\n| async-generator | 1.10 | attrs | 20.3.0 | backcall | 0.2.0 |\n| bcrypt | 3.2.0 | bidict | 0.21.4 | bleach | 3.3.0 |\n| blis | 0.7.7 | boto3 | 1.16.7 | botocore | 1.19.7 |\n| cachetools | 4.2.4 | catalogue | 2.0.7 | certifi | 2020.12.5 |\n| cffi | 1.14.5 | chardet | 4.0.0 | click | 7.1.2 |\n| cloudpickle | 1.6.0 | cmdstanpy | 0.9.68 | configparser | 5.0.1 |\n| convertdate | 2.4.0 | cryptography | 3.4.7 | cycler | 0.10.0 |\n| cymem | 2.0.6 | Cython | 0.29.23 | databricks-automl-runtime | 0.2.7 |\n| databricks-cli | 0.16.4 | dbl-tempo | 0.1.2 | dbus-python | 1.2.16 |\n| decorator | 5.0.6 | defusedxml | 0.7.1 | dill | 0.3.2 |\n| diskcache | 5.4.0 | distlib | 0.3.4 | distro-info | 0.23ubuntu1 |\n| entrypoints | 0.3 | ephem | 4.1.3 | facets-overview | 1.0.0 |\n| fasttext | 0.9.2 | filelock | 3.0.12 | Flask | 1.1.2 |\n| flatbuffers | 2.0 | fsspec | 0.9.0 | future | 0.18.2 |\n| gast | 0.4.0 | gitdb | 4.0.9 | GitPython | 3.1.12 |\n| google-auth | 1.22.1 | google-auth-oauthlib | 0.4.2 | google-pasta | 0.2.0 |\n| grpcio | 1.39.0 | gunicorn | 20.0.4 | gviz-api | 1.10.0 |\n| h5py | 3.1.0 | hijri-converter | 2.2.3 | holidays | 0.13 |\n| horovod | 0.23.0 | htmlmin | 0.1.12 | huggingface-hub | 0.5.1 |\n| idna | 2.10 | ImageHash | 4.2.1 | imbalanced-learn | 0.8.1 |\n| importlib-metadata | 3.10.0 | ipykernel | 5.3.4 | ipython | 7.22.0 |\n| ipython-genutils | 0.2.0 | ipywidgets | 7.6.3 | isodate | 0.6.0 |\n| itsdangerous | 1.1.0 | jedi | 0.17.2 | Jinja2 | 2.11.3 |\n| jmespath | 0.10.0 | joblib | 1.0.1 | joblibspark | 0.3.0 |\n| jsonschema | 3.2.0 | jupyter-client | 6.1.12 | jupyter-core | 4.7.1 |\n| jupyterlab-pygments | 0.1.2 | jupyterlab-widgets | 1.0.0 | keras | 2.8.0 |\n| Keras-Preprocessing | 1.1.2 | kiwisolver | 1.3.1 | koalas | 1.8.2 |\n| korean-lunar-calendar | 0.2.1 | langcodes | 3.3.0 | libclang | 13.0.0 |\n| lightgbm | 3.3.2 | llvmlite | 0.38.0 | LunarCalendar | 0.0.9 |\n| Mako | 1.1.3 | Markdown | 3.3.3 | MarkupSafe | 2.0.1 |\n| matplotlib | 3.4.2 | missingno | 0.5.1 | mistune | 0.8.4 |\n| mleap | 0.18.1 | mlflow-skinny | 1.24.0 | multimethod | 1.8 |\n| murmurhash | 1.0.6 | nbclient | 0.5.3 | nbconvert | 6.0.7 |\n| nbformat | 5.1.3 | nest-asyncio | 1.5.1 | networkx | 2.5 |\n| nltk | 3.6.1 | notebook | 6.3.0 | numba | 0.55.1 |\n| numpy | 1.20.1 | oauthlib | 3.1.0 | opt-einsum | 3.3.0 |\n| packaging | 21.3 | pandas | 1.2.4 | pandas-profiling | 3.1.0 |\n| pandocfilters | 1.4.3 | paramiko | 2.7.2 | parso | 0.7.0 |\n| pathy | 0.6.1 | patsy | 0.5.1 | petastorm | 0.11.4 |\n| pexpect | 4.8.0 | phik | 0.12.2 | pickleshare | 0.7.5 |\n| Pillow | 8.2.0 | pip | 21.0.1 | plotly | 5.6.0 |\n| pmdarima | 1.8.5 | preshed | 3.0.6 | prompt-toolkit | 3.0.17 |\n| prophet | 1.0.1 | protobuf | 3.17.2 | psutil | 5.8.0 |\n| psycopg2 | 2.8.5 | ptyprocess | 0.7.0 | pyarrow | 4.0.0 |\n| pyasn1 | 0.4.8 | pyasn1-modules | 0.2.8 | pybind11 | 2.9.2 |\n| pycparser | 2.20 | pydantic | 1.8.2 | Pygments | 2.8.1 |\n| PyGObject | 3.36.0 | PyMeeus | 0.5.11 | PyNaCl | 1.5.0 |\n| pyodbc | 4.0.30 | pyparsing | 2.4.7 | pyrsistent | 0.17.3 |\n| pystan | 2.19.1.1 | python-apt | 2.0.0+ubuntu0.20.4.7 | python-dateutil | 2.8.1 |\n| python-editor | 1.0.4 | python-engineio | 4.3.0 | python-socketio | 5.4.1 |\n| pytz | 2020.5 | PyWavelets | 1.1.1 | PyYAML | 5.4.1 |\n| pyzmq | 20.0.0 | regex | 2021.4.4 | requests | 2.25.1 |\n| requests-oauthlib | 1.3.0 | requests-unixsocket | 0.2.0 | rsa | 4.8 |\n| s3transfer | 0.3.7 | sacremoses | 0.0.49 | scikit-learn | 0.24.1 |\n| scipy | 1.6.2 | seaborn | 0.11.1 | Send2Trash | 1.5.0 |\n| setuptools | 52.0.0 | setuptools-git | 1.2 | shap | 0.40.0 |\n| simplejson | 3.17.2 | six | 1.15.0 | slicer | 0.0.7 |\n| smart-open | 5.2.1 | smmap | 3.0.5 | spacy | 3.2.3 |\n| spacy-legacy | 3.0.9 | spacy-loggers | 1.0.2 | spark-tensorflow-distributor | 1.0.0 |\n| sqlparse | 0.4.1 | srsly | 2.4.3 | ssh-import-id | 5.10 |\n| statsmodels | 0.12.2 | tabulate | 0.8.7 | tangled-up-in-unicode | 0.1.0 |\n| tenacity | 6.2.0 | tensorboard | 2.8.0 | tensorboard-data-server | 0.6.1 |\n| tensorboard-plugin-profile | 2.5.0 | tensorboard-plugin-wit | 1.8.1 | tensorflow | 2.8.0 |\n| tensorflow-estimator | 2.8.0 | tensorflow-io-gcs-filesystem | 0.24.0 | termcolor | 1.1.0 |\n| terminado | 0.9.4 | testpath | 0.4.4 | tf-estimator-nightly | 2.8.0.dev2021122109 |\n| thinc | 8.0.15 | threadpoolctl | 2.1.0 | tokenizers | 0.12.1 |\n| torch | 1.10.2+cu113 | torchvision | 0.11.3+cu113 | tornado | 6.1 |\n| tqdm | 4.59.0 | traitlets | 5.0.5 | transformers | 4.17.0 |\n| typer | 0.4.1 | typing-extensions | 3.7.4.3 | ujson | 4.0.2 |\n| unattended-upgrades | 0.1 | urllib3 | 1.25.11 | virtualenv | 20.4.1 |\n| visions | 0.7.4 | wasabi | 0.9.1 | wcwidth | 0.2.5 |\n| webencodings | 0.5.1 | websocket-client | 0.57.0 | Werkzeug | 1.0.1 |\n| wheel | 0.36.2 | widgetsnbextension | 3.5.1 | wrapt | 1.12.1 |\n| xgboost | 1.5.2 | zipp | 3.4.1 | | | \n#### Spark packages containing Python modules \n| Spark Package | Python Module | Version |\n| --- | --- | --- |\n| graphframes | graphframes | 0.8.2-db1-spark3.2 | \n### [R libraries](https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html#id3) \nThe R libraries are identical to the [R Libraries](https://docs.databricks.com/archive/runtime-release-notes/10.5.html#rlibraries) in Databricks Runtime 10.5. \n### [Java and Scala libraries (Scala 2.12 cluster)](https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html#id4) \nIn addition to Java and Scala libraries in Databricks Runtime 10.5, Databricks Runtime 10.5 ML contains the following JARs: \n#### CPU clusters \n| Group ID | Artifact ID | Version |\n| --- | --- | --- |\n| com.typesafe.akka | akka-actor\\_2.12 | 2.5.23 |\n| ml.combust.mleap | mleap-databricks-runtime\\_2.12 | 0.18.1-23eb1ef |\n| ml.dmlc | xgboost4j-spark\\_2.12 | 1.5.2 |\n| ml.dmlc | xgboost4j\\_2.12 | 1.5.2 |\n| org.graphframes | graphframes\\_2.12 | 0.8.2-db1-spark3.2 |\n| org.mlflow | mlflow-client | 1.24.0 |\n| org.mlflow | mlflow-spark | 1.24.0 |\n| org.scala-lang.modules | scala-java8-compat\\_2.12 | 0.8.0 |\n| org.tensorflow | spark-tensorflow-connector\\_2.12 | 1.15.0 | \n#### GPU clusters \n| Group ID | Artifact ID | Version |\n| --- | --- | --- |\n| com.typesafe.akka | akka-actor\\_2.12 | 2.5.23 |\n| ml.combust.mleap | mleap-databricks-runtime\\_2.12 | 0.18.1-23eb1ef |\n| ml.dmlc | xgboost4j-spark\\_2.12 | 1.5.2 |\n| ml.dmlc | xgboost4j\\_2.12 | 1.5.2 |\n| org.graphframes | graphframes\\_2.12 | 0.8.2-db1-spark3.2 |\n| org.mlflow | mlflow-client | 1.24.0 |\n| org.mlflow | mlflow-spark | 1.24.0 |\n| org.scala-lang.modules | scala-java8-compat\\_2.12 | 0.8.0 |\n| org.tensorflow | spark-tensorflow-connector\\_2.12 | 1.15.0 |\n\n", "chunk_id": "488557860a60c0461dd49ff7580e6cd8", "url": "https://docs.databricks.com/archive/runtime-release-notes/10.5ml.html"} +{"chunked_text": "# Databricks documentation archive\n### End-of-life for legacy workspaces\n\nImportant \nThis documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See . \nImportant \nAll legacy workspaces were deleted on December 31, 2023. \nDatabricks will delete all legacy AWS workspaces on December 31, 2023. Follow the steps in this article to migrate your workspace.\n\n### End-of-life for legacy workspaces\n#### What are AWS legacy workspaces, and what are they being replaced with?\n\nAWS Legacy workspaces were released in May of 2015. They predate the next generation Databricks platform, the E2 version, which was released in September 2020.\n\n### End-of-life for legacy workspaces\n#### Why are AWS legacy workspaces being deprecated?\n\nThe E2 platform, which continues to be enhanced, provides a richer set of features such as [Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/index.html), [Delta Live Tables](https://docs.databricks.com/delta-live-tables/index.html), [Databricks SQL](https://docs.databricks.com/sql/index.html), the unified account console for [multi-workspace support](https://docs.databricks.com/admin/workspace/create-workspace.html), and more. E2 also has scalability and security enhancements for regulated industries that will benefit customers in other segments as well.\n\n### End-of-life for legacy workspaces\n#### How can you update a legacy AWS workspace to E2?\n\nMigrate your legacy workspaces to E2 by December 31, 2023. The sooner you migrate, the sooner you can benefit from the latest features and better experience provided by E2. \nThe [workspace migration guide](https://docs.databricks.com/_extras/documents/aws-st-workspace-migration-guide.pdf) will help you: \n* Check whether your workspace is a legacy workspace.\n* Migrate your legacy workspace to an E2 workspace.\n\n", "chunk_id": "90e3b47c5b29e055734ade1aec8498f0", "url": "https://docs.databricks.com/archive/aws/end-of-life-legacy-workspaces.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `isnull` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns `true` if `expr` is `NULL`. This function is a synonym for [is null operator](https://docs.databricks.com/sql/language-manual/functions/isnullop.html).\n\n####### `isnull` function\n######## Syntax\n\n```\nisnull(expr)\n\n```\n\n####### `isnull` function\n######## Arguments\n\n* `expr`: An expression of any type.\n\n####### `isnull` function\n######## Returns\n\nA BOOLEAN.\n\n####### `isnull` function\n######## Examples\n\n```\n> SELECT isnull(1);\nfalse\n\n```\n\n####### `isnull` function\n######## Related functions\n\n* [isnotnull function](https://docs.databricks.com/sql/language-manual/functions/isnotnull.html)\n* [isnan function](https://docs.databricks.com/sql/language-manual/functions/isnan.html)\n* [is null operator](https://docs.databricks.com/sql/language-manual/functions/isnullop.html)\n\n", "chunk_id": "e7e29f1363f0a6dc47622ef9f279489c", "url": "https://docs.databricks.com/sql/language-manual/functions/isnull.html"} +{"chunked_text": "# \n### Use certified answers in Genie spaces\n\nPreview \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \nThis article defines certified answers and explains how to use them to increase trust and confidence in responses provided in a Genie space.\n\n### Use certified answers in Genie spaces\n#### What are certified answers?\n\nCertified answers allow you to explicitly define validated, parameterized SQL queries as recipes for answering common questions. They can reduce the likelihood of non-technical users receiving responses that are misleading, incorrect, or hard to interpret. Certified answers help the Genie space provide accurate answers to common questions and let users know when the response they receive has been verified. \n![Certified answer response](https://docs.databricks.com/_images/certified-answer.png) \nNote \nCertified answers are not a substitute for all other instructions. Databricks recommends using certified answers only for recurring, well-established questions. They provide exact answers to specific questions and are not reused by the Assistant to address adjacent questions.\n\n### Use certified answers in Genie spaces\n#### Why create certified answers?\n\nGenie spaces return the result of a generated SQL query to answer user questions. Business users can potentially include jargon that is hard to parse for the large language model (LLM) that generates queries. Suppose a business user provides a prompt like, \u201cShow me the open pipeline in our APAC region.\u201d If `open pipeline` does not correspond directly to a field in one of the tables in your Genie space, the user might get an empty result set accompanied by a generated SQL query, as in the following response: \n![Empty result response](https://docs.databricks.com/_images/empty-result.png) \nFor most business users, it is difficult to interpret or troubleshoot this response. Genie space authors can define certified answers to provide trusted responses for questions like this.\n\n", "chunk_id": "836ea1d6b869e259bfbd0abc19e2aa89", "url": "https://docs.databricks.com/prpr-ans-67656E69652D737061636573.html"} +{"chunked_text": "# \n### Use certified answers in Genie spaces\n#### Define a certified answer\n\nTo define a certified answer, identify the question you expect users to ask. Then do the following: \n1. Define and test a SQL query that answers the question. \nThe following is an example query designed to answer the question in the previous example. The table this query returns includes results from all regions in the data. \n```\nSELECT\no.id AS `OppId`,\na.region__c AS `Region`,\no.name AS `Opportunity Name`,\no.forecastcategory AS `Forecast Category`,\no.stagename,\no.closedate AS `Close Date`,\no.amount AS `Opp Amount`\nFROM\nusers.user_name.opportunity o\nJOIN catalog.schema.accounts a ON o.accountid = a.id\n\nWHERE\no.forecastcategory = 'Pipeline' AND\no.stagename NOT LIKE '%closed%';\n\n```\n2. Define a Unity Catalog function. \nYour Unity Catalog function should parameterize the query and produce results matching the specific conditions a user might inquire about. \nSee [Create a SQL table function](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-ddl-create-sql-function.html#create-a-sql-table-function) to learn how to define a Unity Catalog function. \nThe following function takes a list of regions and returns a table. The comments provided in the function definitions are critical for instructing the Genie space on when and how to invoke this function. This example includes comments in the function\u2019s parameter definition and comments defined in the SQL table function that explain what the function does. \n* **Parameter comments**: The `open_opps_in_region` function expects an array of strings as a parameter. The comment includes an example of the expected input. If no parameter is supplied, the default value is `NULL`.\n* **Function comments**: The comment in the SQL table function explains what the function does.The associated SQL query has been adjusted to include the Unity Catalog function in the `WHERE` clause. \n```\nCREATE OR REPLACE FUNCTION users.user_name.open_opps_in_region (\nregions ARRAY COMMENT 'List of regions. Example: [\"APAC\", \"EMEA\"]' DEFAULT NULL\n) RETURNS TABLE\nCOMMENT 'Addresses questions about the pipeline in a region by returning a list of all the open opportunities.'\nRETURN\n\nSELECT\no.id AS `OppId`,\na.region__c AS `Region`,\no.name AS `Opportunity Name`,\no.forecastcategory AS `Forecast Category`,\no.stagename,\no.closedate AS `Close Date`,\no.amount AS `Opp Amount`\nFROM\ncatalog.schema.accounts.opportunity o\nJOIN catalog.schema.accounts a ON o.accountid = a.id\nWHERE\no.forecastcategory = 'Pipeline' AND\no.stagename NOT LIKE '%closed%' AND\nisnull(open_opps_in_region.regions) OR array_contains(open_opps_in_region.regions, region__c);\n\n``` \nWhen you run the code to create a function, it\u2019s registered to the currently active schema by default. See [Custom SQL functions in Unity Catalog](https://docs.databricks.com/udf/unity-catalog.html#custom-uc-functions).\n3. Add certified answer. \nAfter being published as a Unity Catalog function, a user with at least CAN EDIT permission on the Genie space can add it in the **Instructions** tab of the Genie space. \n![Add certified answer button](https://docs.databricks.com/_images/btn-certified-answers.png)\n\n", "chunk_id": "fee27e3dc1f830c1b5f954cf1f152a08", "url": "https://docs.databricks.com/prpr-ans-67656E69652D737061636573.html"} +{"chunked_text": "# \n### Use certified answers in Genie spaces\n#### Required permissions\n\nGenie space authors with at least CAN EDIT permission on a Genie space can add or remove certified answers. \nGenie space users must have CAN USE permission on the catalog and schema that contains the function. To invoke a certified answer, they must have EXECUTE permission on the function in Unity Catalog. Unity Catalog securable objects inherit permissions from their parent containers. See [Securable objects in Unity Catalog](https://docs.databricks.com/data-governance/unity-catalog/manage-privileges/privileges.html#securable-objects). \nTo simplify sharing in a Genie space, Databricks recommends creating a dedicated schema to contain all of the functions that you want to use in your Genie space.\n\n", "chunk_id": "b6b86abd0e158e823188b4421109d8a1", "url": "https://docs.databricks.com/prpr-ans-67656E69652D737061636573.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `count` aggregate function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns the number of retrieved rows in a group.\n\n####### `count` aggregate function\n######## Syntax\n\n```\ncount ( [DISTINCT | ALL] * ) [FILTER ( WHERE cond ) ]\n\n``` \n```\ncount ( [DISTINCT | ALL] expr [, ...] ) [FILTER ( WHERE cond ) ]\n\n``` \nThis function can also be invoked as a [window function](https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html) using the `OVER` clause.\n\n####### `count` aggregate function\n######## Arguments\n\n* `*`: Counts all rows in the group.\n* `expr`: Counts all rows for which all `exprN` are not `NULL`.\n* `cond`: An optional boolean expression filtering the rows used for aggregation.\n\n####### `count` aggregate function\n######## Returns\n\nA `BIGINT`. \nIf `DISTINCT` is specified then the function returns the number of unique values which do not contain `NULL`. \nIf `ALL` is specified then the function returns the number of all values. In case of `*` this includes those containing `NULL`.\n\n", "chunk_id": "d8a929831ae1ca5d106cb98dd1b2bae8", "url": "https://docs.databricks.com/sql/language-manual/functions/count.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `count` aggregate function\n######## Examples\n\n```\n> SELECT count(*) FROM VALUES (NULL), (5), (5), (20) AS tab(col);\n4\n\n> SELECT count(1) FROM VALUES (NULL), (5), (5), (20) AS tab(col);\n4\n\n> SELECT count(col) FROM VALUES (NULL), (5), (5), (20) AS tab(col);\n3\n\n> SELECT count(col) FILTER(WHERE col < 10)\nFROM VALUES (NULL), (5), (5), (20) AS tab(col);\n2\n\n> SELECT count(DISTINCT col) FROM VALUES (NULL), (5), (5), (10) AS tab(col);\n2\n\n> SELECT count(col1, col2)\nFROM VALUES (NULL, NULL), (5, NULL), (5, 1), (5, 2), (5, 2), (NULL, 2), (20, 2) AS tab(col1, col2);\n4\n\n> SELECT count(DISTINCT col1, col2)\nFROM VALUES (NULL, NULL), (5, NULL), (5, 1), (5, 2), (NULL, 2), (20, 2) AS tab(col1, col2);\n3\n\n```\n\n", "chunk_id": "f8028444d75a99ee1a04dab8f922c5d9", "url": "https://docs.databricks.com/sql/language-manual/functions/count.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `count` aggregate function\n######## Related functions\n\n* [avg aggregate function](https://docs.databricks.com/sql/language-manual/functions/avg.html)\n* [sum aggregate function](https://docs.databricks.com/sql/language-manual/functions/sum.html)\n* [min aggregate function](https://docs.databricks.com/sql/language-manual/functions/min.html)\n* [max aggregate function](https://docs.databricks.com/sql/language-manual/functions/max.html)\n* [count\\_if aggregate function](https://docs.databricks.com/sql/language-manual/functions/count_if.html)\n* [Window functions](https://docs.databricks.com/sql/language-manual/sql-ref-window-functions.html)\n\n", "chunk_id": "f7214233d2cdca6d04e57868b753875f", "url": "https://docs.databricks.com/sql/language-manual/functions/count.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### ANALYZE TABLE\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nThe `ANALYZE TABLE` statement collects statistics about a specific table or all tables in a specified schema. These statistics are used by the query optimizer to generate an optimal query plan. Because they can become outdated as data changes, these statistics are not used to directly answer queries. Stale statistics are still useful for the query optimizer when creating a query plan.\n\n#### ANALYZE TABLE\n##### Syntax\n\n```\nANALYZE TABLE table_name [ PARTITION clause ]\nCOMPUTE [ DELTA ] STATISTICS [ NOSCAN | FOR COLUMNS col1 [, ...] | FOR ALL COLUMNS ]\n\nANALYZE TABLES [ { FROM | IN } schema_name ] COMPUTE STATISTICS [ NOSCAN ]\n\n```\n\n", "chunk_id": "13bd2185dfe6f44fd1ca04e39b70f628", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-analyze-table.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### ANALYZE TABLE\n##### Parameters\n\n* **[table\\_name](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#table-name)** \nIdentifies the table to be analyzed. The name must not include a [temporal specification](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#table-name) or path.\nIf the table cannot be found Databricks raises a [TABLE\\_OR\\_VIEW\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/table-or-view-not-found-error-class.html) error.\n* **[PARTITION clause](https://docs.databricks.com/sql/language-manual/sql-ref-partition.html#partition)** \nOptionally limits the command to a subset of partitions. \nThis clause is not supported for Delta Lake tables.\n* **`DELTA`** \n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 14.3 LTS and above \nRecomputes statistics stored in the Delta log for the columns configured for statistics collection in a Delta table. \nWhen the `DELTA` keyword is specified, normal statistics for the query optimizer are not collected. \nDatabricks recommends running `ANALYZE TABLE table_name COMPUTE DELTA STATISTICS` after setting new columns for data skipping to update statistics for all rows in a table. For optimized performance, run `ANALYZE TABLE table_name COMPUTE STATISTICS` to update the query plan after the Delta log update completes.\n* **[ NOSCAN | FOR COLUMNS col [, \u2026] | FOR ALL COLUMNS ]** \nIf no analyze option is specified, `ANALYZE TABLE` collects the table\u2019s number of rows and size in bytes. \n+ **NOSCAN** \nCollect only the table\u2019s size in bytes ( which does not require scanning the entire table ).\n+ **FOR COLUMNS col [, \u2026] | FOR ALL COLUMNS** \nCollect column statistics for each column specified, or alternatively for every column, as well as table statistics. \nColumn statistics are not supported in combination with the `PARTITION` clause.\n* **{ FROM `|` IN } [schema\\_name](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#schema-name)** \nSpecifies the name of the schema to be analyzed. Without a schema name, `ANALYZE TABLES` collects all tables in the current schema that the current user has permission to analyze.\n\n", "chunk_id": "79cd64fa368396d36a0f61ccf1542dcd", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-analyze-table.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### ANALYZE TABLE\n##### Examples\n\n```\n> CREATE TABLE students (name STRING, student_id INT) PARTITIONED BY (student_id);\n> INSERT INTO students PARTITION (student_id = 111111) VALUES ('Mark');\n> INSERT INTO students PARTITION (student_id = 222222) VALUES ('John');\n\n> ANALYZE TABLE students COMPUTE STATISTICS NOSCAN;\n\n> DESC EXTENDED students;\ncol_name data_type comment\n-------------------- -------------------- -------\nname string null\nstudent_id int null\n... ... ...\nStatistics 864 bytes\n... ... ...\n\n> ANALYZE TABLE students COMPUTE STATISTICS;\n\n> DESC EXTENDED students;\ncol_name data_type comment\n-------------------- -------------------- -------\nname string null\nstudent_id int null\n... ... ...\nStatistics 864 bytes, 2 rows\n... ... ...\n\n-- Note: ANALYZE TABLE .. PARTITION is not supported for Delta tables.\n> ANALYZE TABLE students PARTITION (student_id = 111111) COMPUTE STATISTICS;\n\n> DESC EXTENDED students PARTITION (student_id = 111111);\ncol_name data_type comment\n-------------------- -------------------- -------\nname string null\nstudent_id int null\n... ... ...\nPartition Statistics 432 bytes, 1 rows\n... ... ...\nOutputFormat org.apache.hadoop...\n\n> ANALYZE TABLE students COMPUTE STATISTICS FOR COLUMNS name;\n\n> DESC EXTENDED students name;\ninfo_name info_value\n-------------- ----------\ncol_name name\ndata_type string\ncomment NULL\nmin NULL\nmax NULL\nnum_nulls 0\ndistinct_count 2\navg_col_len 4\nmax_col_len 4\nhistogram NULL\n\n> ANALYZE TABLES IN school_schema COMPUTE STATISTICS NOSCAN;\n> DESC EXTENDED teachers;\ncol_name data_type comment\n-------------------- -------------------- -------\nname string null\nteacher_id int null\n... ... ...\nStatistics 1382 bytes\n... ... ...\n\n> DESC EXTENDED students;\ncol_name data_type comment\n-------------------- -------------------- -------\nname string null\nstudent_id int null\n... ... ...\nStatistics 864 bytes\n... ... ...\n\n> ANALYZE TABLES COMPUTE STATISTICS;\n> DESC EXTENDED teachers;\ncol_name data_type comment\n-------------------- -------------------- -------\nname string null\nteacher_id int null\n... ... ...\nStatistics 1382 bytes, 2 rows\n... ... ...\n\n> DESC EXTENDED students;\ncol_name data_type comment\n-------------------- -------------------- -------\nname string null\nstudent_id int null\n... ... ...\nStatistics 864 bytes, 2 rows\n... ... ...\n\n> ANALYZE TABLE some_delta_table COMPUTE DELTA STATISTICS;\n\n```\n\n", "chunk_id": "aba40b5fd96d1670ca60aae4e3cca4cf", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-analyze-table.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### ANALYZE TABLE\n##### Related articles\n\n* [PARTITION](https://docs.databricks.com/sql/language-manual/sql-ref-partition.html#partition)\n\n", "chunk_id": "b0e40b5a5911a80aeb389f48f51057f7", "url": "https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-analyze-table.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### DELETE FROM\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nDeletes the rows that match a predicate. When no predicate is provided, deletes all rows. \nThis statement is only supported for Delta Lake tables.\n\n#### DELETE FROM\n##### Syntax\n\n```\nDELETE FROM table_name [table_alias] [WHERE predicate]\n\n```\n\n", "chunk_id": "9066a98fc8a629ecfa245b2f2cb696bf", "url": "https://docs.databricks.com/sql/language-manual/delta-delete-from.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### DELETE FROM\n##### Parameters\n\n* [table\\_name](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#table-name) \nIdentifies an existing table. The name must not include a [temporal specification](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#table-name). \n`table_name` must not be a foreign table.\n* [table\\_alias](https://docs.databricks.com/sql/language-manual/sql-ref-names.html#table-alias) \nDefine an alias for the table. The alias must not include a column list.\n* **[WHERE](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-select-where.html)** \nFilter rows by predicate. \nThe `WHERE` predicate supports subqueries, including `IN`, `NOT IN`, `EXISTS`, `NOT EXISTS`, and scalar subqueries. The following types of subqueries are not supported: \n+ Nested subqueries, that is, an subquery inside another subquery\n+ `NOT IN` subquery inside an `OR`, for example, `a = 3 OR b NOT IN (SELECT c from t)`In most cases, you can rewrite `NOT IN` subqueries using `NOT EXISTS`. We recommend using\n`NOT EXISTS` whenever possible, as `DELETE` with `NOT IN` subqueries can be slow.\n\n", "chunk_id": "55ddda2271c6629d1908297dc5ef2c85", "url": "https://docs.databricks.com/sql/language-manual/delta-delete-from.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n#### DELETE FROM\n##### Examples\n\n```\n> DELETE FROM events WHERE date < '2017-01-01'\n\n> DELETE FROM all_events\nWHERE session_time < (SELECT min(session_time) FROM good_events)\n\n> DELETE FROM orders AS t1\nWHERE EXISTS (SELECT oid FROM returned_orders WHERE t1.oid = oid)\n\n> DELETE FROM events\nWHERE category NOT IN (SELECT category FROM events2 WHERE date > '2001-01-01')\n\n``` \n* [COPY](https://docs.databricks.com/sql/language-manual/delta-copy-into.html)\n* [INSERT](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-dml-insert-into.html)\n* [MERGE](https://docs.databricks.com/sql/language-manual/delta-merge-into.html)\n* [PARTITION](https://docs.databricks.com/sql/language-manual/sql-ref-partition.html#partition)\n* [query](https://docs.databricks.com/sql/language-manual/sql-ref-syntax-qry-query.html)\n* [UPDATE](https://docs.databricks.com/sql/language-manual/delta-update.html)\n\n", "chunk_id": "85036847fce272d2799bd027f90568d9", "url": "https://docs.databricks.com/sql/language-manual/delta-delete-from.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### September 2020\n\nThese features and Databricks platform improvements were released in September 2020. \nNote \nReleases are staged. Your Databricks account may not be updated until up to a week after the initial release date.\n\n#### September 2020\n##### Databricks Runtime 7.3, 7.3 ML, and 7.3 Genomics are now GA\n\n**September 24, 2020** \nDatabricks Runtime 7.3, Databricks Runtime 7.3 for Machine Learning, and Databricks Runtime 7.3 for Genomics are now generally available. They bring many features and improvements, including: \n* Delta Lake performance optimizations significantly reduce overhead\n* Clone metrics\n* Delta Lake `MERGE INTO` improvements\n* Specify the initial position for Delta Lake Structured Streaming\n* Auto Loader improvements\n* Adaptive query execution\n* Azure Synapse Analytics connector column length control\n* Improved behavior of `dbutils.credentials.showRoles`\n* Kinesis starting position for stream using `at_timestamp`\n* Simplified pandas to Spark DataFrame conversion\n* New `maxResultSize` in `toPandas()` call\n* Debuggability of pandas and PySpark UDFs\n* GA of S3 storage connector updates\n* (ML only) Conda activation on workers\n* (Genomics only) Support for reading BGEN files with uncompressed or zstd-compressed genotypes\n* Library upgrades \nFor more information, see [Databricks Runtime 7.3 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts.html) and [Databricks Runtime 7.3 LTS for Machine Learning (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html).\n\n", "chunk_id": "75d6f2f9bcd7490148f44ee73a6e0a69", "url": "https://docs.databricks.com/release-notes/product/2020/september.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### September 2020\n##### Debugging hints for SAML credential passthrough misconfigurations\n\n**September 23-29, 2020: Version 3.29** \nThe response from a single-sign on request using SAML credential passthrough now includes an error hint to help debug misconfigurations. For details, see [Troubleshooting](https://docs.databricks.com/archive/credential-passthrough/iam-federation.html#troubleshooting).\n\n#### September 2020\n##### Single Node clusters (Public Preview)\n\n**September 23-29, 2020: Version 3.29** \nA Single Node cluster is a cluster consisting of a Spark driver and no Spark workers. In contrast, Standard mode clusters require at least one Spark worker to run Spark jobs. Single Node mode clusters are helpful in the following situations: \n* Running single node machine learning workloads that need Spark to load and save data\n* Lightweight exploratory data analysis (EDA) \nFor details, see [Single-node or multi-node compute](https://docs.databricks.com/compute/configure.html#single-node).\n\n#### September 2020\n##### DBFS REST API rate limiting\n\n**September 23-29, 2020: Version 3.29** \nTo ensure high quality of service under heavy load, Databricks is now enforcing API rate limits for [DBFS API](https://docs.databricks.com/api/workspace/dbfs) calls. Limits are set per workspace to ensure fair usage and high availability. Automatic retries are available using Databricks CLI version 0.12.0 and above. We advise all customers to switch to the latest Databricks CLI version.\n\n#### September 2020\n##### New sidebar icons\n\n**September 23-29, 2020** \nWe\u2019ve updated the sidebar in the Databricks workspace UI. No big deal, but we think the new icons look pretty nice. \n![sidebar](https://docs.databricks.com/_images/new-sidebar-icons.png)\n\n", "chunk_id": "9ce9b460619df36419ebfa6790bb2230", "url": "https://docs.databricks.com/release-notes/product/2020/september.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### September 2020\n##### Running jobs limit increase\n\n**September 23-29, 2020: Version 3.29** \nThe concurrent running job run limit has been increased from 150 to 1000 per workspace. No longer will runs over 150 be queued in the pending state. Instead of a queue for run requests above concurrent runs, a `429 Too Many Requests` response is returned when you request a run that cannot be started immediately. This limit increase was rolled out gradually and is now available on all workspaces in all regions.\n\n#### September 2020\n##### Artifact access control lists (ACLs) in MLflow\n\n**September 23-29, 2020: Version 3.29** \nMLflow Experiment permissions are now enforced on artifacts in MLflow Tracking, enabling you to easily control access to your models, datasets, and other files. By default, when you create a new experiment, its run artifacts are now stored in an MLflow-managed location. The four MLflow Experiment permissions levels (NO PERMISSIONS, CAN READ, CAN EDIT, and CAN MANAGE) automatically apply to run artifacts stored in MLflow-managed locations as follows: \n* CAN EDIT or CAN MANAGE permissions are required to log run artifacts to an experiment.\n* CAN READ permissions are required to list and download run artifacts from an experiment. \nFor more information, see [MLFlow experiment ACLs](https://docs.databricks.com/security/auth-authz/access-control/index.html#experiments).\n\n#### September 2020\n##### MLflow usability improvements\n\n**September 23-29, 2020: Version 3.29** \nThis release includes the following MLflow usability improvements: \n* The MLflow **Experiment** and **Registered Models** pages now have tips to help new users get started.\n* The model version table now shows the description text for a model version. A new column shows the first 32 characters or the first line (whichever is shorter) of the description.\n\n", "chunk_id": "2e13a0f8ddecedaefd04fb47793fbf1a", "url": "https://docs.databricks.com/release-notes/product/2020/september.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### September 2020\n##### New Databricks Power BI connector (Public Preview)\n\n**September 22, 2020** \nPower BI Desktop version 2.85.681.0 includes a new Databricks Power BI connector that makes the integration between Databricks and Power BI far more seamless and reliable. The new connector comes with the following improvements: \n* Simple connection configuration: the new Power BI Databricks connector is integrated into Power BI, and you configure it using a simple dialog with a couple of clicks.\n* Faster imports and optimized metadata calls, thanks to the new Databricks ODBC driver, which comes with significant performance improvements.\n* Access to Databricks data through Power BI respects Databricks [table access control](https://docs.databricks.com/data-governance/table-acls/index.html). \nFor more information, see [Connect Power BI to Databricks](https://docs.databricks.com/partners/bi/power-bi.html).\n\n#### September 2020\n##### New JDBC and ODBC drivers bring faster and lower latency BI\n\n**September 15, 2020** \nWe have released new versions of the Databricks JDBC and ODBC drivers [(download)](https://databricks.com/spark/odbc-driver-download) with the following improvements: \n* Performance: Reduced connection and short query latency, improved result transfer speed based on Apache Arrow serialization and improved metadata retrieval performance.\n* User experience: Authentication using Microsoft Entra ID OAuth2 access tokens, improved error messages and auto-retry when connecting to a shutdown cluster, more robust handling of retries on intermittent network errors.\n* Support for connections using HTTP proxy. \nFor more information about connecting to BI tools using JDBC and ODBC, see [Databricks ODBC and JDBC Drivers](https://docs.databricks.com/integrations/jdbc-odbc-bi.html).\n\n", "chunk_id": "cae73728ba4dfcfd0e956639e7978a9f", "url": "https://docs.databricks.com/release-notes/product/2020/september.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### September 2020\n##### MLflow Model Serving (Public Preview)\n\n**September 9-15, 2020: Version 3.28** \nMLflow Model Serving is now available in Public Preview. MLflow Model Serving allows you to deploy a MLflow model registered in Model Registry as a REST API endpoint hosted and managed by Databricks. When you enable model serving for a registered model, Databricks creates a cluster and deploys all non-archived versions of that model. \nYou can query all model versions by REST API requests with standard Databricks authentication. Model access rights are inherited from the Model Registry \u2014 anyone with read rights for a registered model can query any of the deployed model versions. While this service is in preview, we recommend its use for low throughput and non-critical applications. \nFor more information, see [Legacy MLflow Model Serving on Databricks](https://docs.databricks.com/archive/legacy-model-serving/model-serving.html).\n\n#### September 2020\n##### Clusters UI improvements\n\n**September 9-15, 2020: Version 3.28** \nThe [Clusters page](https://docs.databricks.com/compute/clusters-manage.html#cluster-list) now has separate tabs for **All-Purpose Clusters** and **Job Clusters**. The list on each tab is now paginated. In addition, we have fixed the delay that sometimes occurred between creating a cluster and being able to see it in the UI.\n\n", "chunk_id": "e1325cf77f85c1a8571dd67f32ab4bff", "url": "https://docs.databricks.com/release-notes/product/2020/september.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### September 2020\n##### Visibility controls for jobs, clusters, notebooks, and other workspace objects\n\n**September 9-15, 2020: Version 3.28** \nBy default, any user can see all jobs, clusters, notebooks, and folders in their workspace displayed in the Databricks UI and can list them using the Databricks API, even when access control is enabled for those objects and a user has no permissions on those objects. \nNow any Databricks admin can enable visibility controls for notebooks and folders (workspace objects), clusters, and jobs to ensure that users can view only those objects that they have been given access to through workspace, cluster, or jobs access control. \nSee [Access controls lists can no longer be disabled](https://docs.databricks.com/release-notes/product/2023/november.html#acls).\n\n#### September 2020\n##### Ability to create tokens no longer permitted by default\n\n**September 9-15, 2020: Version 3.28** \nFor workspaces created after the release of Databricks platform version 3.28, users will no longer have the ability to generate personal access tokens by default. Admins must explicitly grant those permissions, whether to the entire `users` group or on a user-by-user or group-by-group basis. Workspaces created before 3.28 was released will maintain the permissions that were already in place. \nSee [Monitor and manage personal access tokens](https://docs.databricks.com/admin/access-control/tokens.html).\n\n#### September 2020\n##### Support for c5.24xlarge instances\n\n**September 9-15, 2020: Version 3.28** \nDatabricks now supports the c5.24xlarge EC2 instance type.\n\n", "chunk_id": "f79da20d605c9f869d9ef0caa9da709c", "url": "https://docs.databricks.com/release-notes/product/2020/september.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### September 2020\n##### MLflow Model Registry supports sharing of models across workspaces\n\n**September 9, 2020** \nDatabricks now supports access to the model registry from multiple workspaces. You can now register models, track model runs, and load models across workspaces. Multiple teams can now share access to models, and organizations can use multiple workspaces to handle the different stages of development. For details, see [Share models across workspaces](https://docs.databricks.com/machine-learning/manage-model-lifecycle/multiple-workspaces.html). \nThis functionality requires MLflow Python client version 1.11.0 or above.\n\n#### September 2020\n##### Databricks Runtime 7.3 (Beta)\n\n**September 3, 2020** \nDatabricks Runtime 7.3, Databricks Runtime 7.3 for Machine Learning, and Databricks Runtime 7.3 for Genomics are now available as Beta releases. \nFor information, see [Databricks Runtime 7.3 LTS (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts.html) and [Databricks Runtime 7.3 LTS for Machine Learning (unsupported)](https://docs.databricks.com/archive/runtime-release-notes/7.3lts-ml.html).\n\n", "chunk_id": "572f7a98e2ca7023efd1977ef6da4062", "url": "https://docs.databricks.com/release-notes/product/2020/september.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### September 2020\n##### E2 architecture\u2014now GA\u2014provides better security, scalability, and management tools\n\n**September 1, 2020** \nDatabricks is excited to announce the general availability of the new [E2 architecture](https://docs.databricks.com/archive/aws/end-of-life-legacy-workspaces.html#e2-architecture) for the Databricks Unified Data Analytics Platform on AWS. With this release, we have added business-critical features that make the platform more secure, more scalable, and simpler to manage for all of your data pipeline, analytics, and machine learning workloads. \nThe Databricks platform now provides stronger security controls required by regulated enterprises, is API-driven for better automation support, and increases the scalability of your production and business-critical operations. For more information, see our [blog post](https://databricks.com/blog/2020/09/01/databricks-unified-data-analytics-platform-for-aws-gets-a-major-upgrade.html).\n\n", "chunk_id": "cb2f475f24c3543005698a0cedcd3d47", "url": "https://docs.databricks.com/release-notes/product/2020/september.html"} +{"chunked_text": "# Databricks release notes\n## Databricks platform release notes\n#### September 2020\n##### Account API is generally available on the E2 version of the platform\n\n**September 1, 2020** \nAs part of the GA of the E2 version of the Databricks platform, the Multi-workspace API has been renamed the [Account API](https://docs.databricks.com/api/account/introduction), and all endpoints related to [workspace creation](https://docs.databricks.com/admin/workspace/create-workspace-api.html) and [customer-managed VPCs](https://docs.databricks.com/security/network/classic/customer-managed-vpc.html) are also GA. To use the Account API to create new workspaces, your account must be on the [E2 version of the platform](https://docs.databricks.com/archive/aws/end-of-life-legacy-workspaces.html#e2-architecture) or on a select custom plan that allows multiple workspaces per account. Only E2 accounts allow customer-managed VPCs. \n[Billable usage delivery configuration](https://docs.databricks.com/admin/account-settings/billable-usage-delivery.html) also requires the Account API. This feature is available on all Databricks accounts, but remains in Public Preview.\n\n#### September 2020\n##### Secure cluster connectivity (no public IPs) is now the default on the E2 version of the platform\n\n**September 1, 2020** \nAs part of the GA of the [E2 version of the Databricks platform](https://docs.databricks.com/archive/aws/end-of-life-legacy-workspaces.html#e2-architecture), secure cluster connectivity (no public IPs) is now the default for workspaces created on that version of the platform. \nFor more information, see [Secure cluster connectivity](https://docs.databricks.com/security/network/classic/secure-cluster-connectivity.html).\n\n", "chunk_id": "2258d05a547dabc87de3d789cfbd4552", "url": "https://docs.databricks.com/release-notes/product/2020/september.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `md5` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns an MD5 128-bit checksum of `expr` as a hex string.\n\n####### `md5` function\n######## Syntax\n\n```\nmd5(expr)\n\n```\n\n####### `md5` function\n######## Arguments\n\n* `expr`: An BINARY expression.\n\n####### `md5` function\n######## Returns\n\nA STRING.\n\n####### `md5` function\n######## Examples\n\n```\n> SELECT md5('Spark');\n8cde774d6f7333752ed72cacddb05126\n\n```\n\n####### `md5` function\n######## Related functions\n\n* [crc32 function](https://docs.databricks.com/sql/language-manual/functions/crc32.html)\n* [hash function](https://docs.databricks.com/sql/language-manual/functions/hash.html)\n* [mask function](https://docs.databricks.com/sql/language-manual/functions/mask.html)\n* [sha function](https://docs.databricks.com/sql/language-manual/functions/sha.html)\n* [sha1 function](https://docs.databricks.com/sql/language-manual/functions/sha1.html)\n* [sha2 function](https://docs.databricks.com/sql/language-manual/functions/sha2.html)\n\n", "chunk_id": "8e252cdcb3af7dedaf3d2d73b675de00", "url": "https://docs.databricks.com/sql/language-manual/functions/md5.html"} +{"chunked_text": "# Develop on Databricks\n## SQL language reference\n### Functions\n#### Built-in functions\n##### Alphabetical list of built-in functions\n####### `chr` function\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime \nReturns the character at the supplied UTF-16 code point. This function is a synonym for [char function](https://docs.databricks.com/sql/language-manual/functions/char.html).\n\n####### `chr` function\n######## Syntax\n\n```\nchr(expr)\n\n```\n\n####### `chr` function\n######## Arguments\n\n* `expr`: An expression that evaluates to an integral numeric.\n\n####### `chr` function\n######## Returns\n\nThe result type is STRING. \nIf the argument is less than 0, an empty string is returned.\nIf the argument is larger than `255`, it is treated as modulo 256.\nThis implies `char` covers the ASCII and Latin-1 Supplement range of UTF-16.\n\n####### `chr` function\n######## Examples\n\n```\n> SELECT chr(65);\nA\n\n```\n\n####### `chr` function\n######## Related functions\n\n* [char function](https://docs.databricks.com/sql/language-manual/functions/char.html)\n* [ascii function](https://docs.databricks.com/sql/language-manual/functions/ascii.html)\n\n", "chunk_id": "70f54554615d771158b7f2fd9d779e9f", "url": "https://docs.databricks.com/sql/language-manual/functions/chr.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n\n**Applies to:** ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks SQL ![check marked yes](https://docs.databricks.com/_images/check.png) Databricks Runtime 12.2 and above \nAll error classes returned by Databricks are associated with a 5 character `SQLSTATE`.\nA `SQLSTATE` is a SQL standard encoding for error conditions commonly used by `JDBC`, `ODBC`, and other client APIs. \nA `SQLSTATE` consists of two portions: A two character class, and a three character subclass.\nEach character must be a digit `'0'` to `'9'` or `'A'` to `'Z'`. \nWhile many `SQLSTATE` values are prescribed by the SQL standard, others are common in the industry, specific to Spark, or Databricks. \nWhere neccessary Spark and Databricks use the `'KD'` class and `'K**'` subclass ranges for custom SQLSTATEs.\nThe class `'XX'` is used for internal errors warranting a bug report. \nFor an ordered list of error classes see: [Error handling in Databricks](https://docs.databricks.com/error-messages/index.html) \nDatabricks uses the following `SQLSTATE` classes:\n\n", "chunk_id": "eb7e392a4804434c67af425c0da3a13e", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `07`: dynamic SQL error\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `07001` | using clause does not match dynamic parameter specifications |\n| | [ALL\\_PARAMETERS\\_MUST\\_BE\\_NAMED](https://docs.databricks.com/error-messages/error-classes.html#all_parameters_must_be_named) |\n| `07501` | The option specified on PREPARE or EXECUTE is not valid. |\n| | [INVALID\\_STATEMENT\\_FOR\\_EXECUTE\\_INTO](https://docs.databricks.com/error-messages/error-classes.html#invalid_statement_for_execute_into), [NESTED\\_EXECUTE\\_IMMEDIATE](https://docs.databricks.com/error-messages/error-classes.html#nested_execute_immediate) |\n\n", "chunk_id": "03e4c58a1e8a42ec3dabe5a42a7404ef", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `08`: connection exception\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `08000` | connection exception |\n| | [AI\\_FUNCTION\\_HTTP\\_REQUEST\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#ai_function_http_request_error), [AI\\_FUNCTION\\_INVALID\\_HTTP\\_RESPONSE](https://docs.databricks.com/error-messages/error-classes.html#ai_function_invalid_http_response), [CANNOT\\_VALIDATE\\_CONNECTION](https://docs.databricks.com/error-messages/error-classes.html#cannot_validate_connection) |\n| `08001` | SQL-client unable to establish SQL-connection |\n| | [CANNOT\\_ESTABLISH\\_CONNECTION](https://docs.databricks.com/error-messages/error-classes.html#cannot_establish_connection), [CANNOT\\_ESTABLISH\\_CONNECTION\\_SERVERLESS](https://docs.databricks.com/error-messages/error-classes.html#cannot_establish_connection_serverless) |\n| `08003` | connection does not exist |\n| | [DELTA\\_ACTIVE\\_SPARK\\_SESSION\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_active_spark_session_not_found) |\n| `08KD1` | server busy |\n| | [SERVER\\_IS\\_BUSY](https://docs.databricks.com/error-messages/error-classes.html#server_is_busy) |\n\n", "chunk_id": "93bd67d57656f42556570f4e1558b90a", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `0A`: feature not supported\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `0A000` | feature not supported |\n| | [AI\\_FUNCTION\\_UNSUPPORTED\\_REQUEST](https://docs.databricks.com/error-messages/error-classes.html#ai_function_unsupported_request), [AI\\_FUNCTION\\_UNSUPPORTED\\_RETURN\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#ai_function_unsupported_return_type), [AVRO\\_DEFAULT\\_VALUES\\_UNSUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#avro_default_values_unsupported), [AVRO\\_POSITIONAL\\_FIELD\\_MATCHING\\_UNSUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#avro_positional_field_matching_unsupported), [CANNOT\\_INVOKE\\_IN\\_TRANSFORMATIONS](https://docs.databricks.com/error-messages/error-classes.html#cannot_invoke_in_transformations), [CANNOT\\_SAVE\\_VARIANT](https://docs.databricks.com/error-messages/error-classes.html#cannot_save_variant), [CANNOT\\_UPDATE\\_FIELD](https://docs.databricks.com/error-messages/cannot-update-field-error-class.html), [CF\\_ADD\\_NEW\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#cf_add_new_not_supported), [CF\\_EVENT\\_NOTIFICATION\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#cf_event_notification_not_supported), [CF\\_PERIODIC\\_BACKFILL\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#cf_periodic_backfill_not_supported), [CF\\_SOURCE\\_UNSUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#cf_source_unsupported), [CF\\_STATEFUL\\_STREAMING\\_SCHEMA\\_EVOLUTION](https://docs.databricks.com/error-messages/error-classes.html#cf_stateful_streaming_schema_evolution), [CF\\_UNSUPPORTED\\_CLOUD\\_FILES\\_SQL\\_FUNCTION](https://docs.databricks.com/error-messages/error-classes.html#cf_unsupported_cloud_files_sql_function), [CF\\_UNSUPPORTED\\_FORMAT\\_FOR\\_SCHEMA\\_INFERENCE](https://docs.databricks.com/error-messages/error-classes.html#cf_unsupported_format_for_schema_inference), [CF\\_UNSUPPORTED\\_LOG\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#cf_unsupported_log_version), [CF\\_UNSUPPORTED\\_SCHEMA\\_EVOLUTION\\_MODE](https://docs.databricks.com/error-messages/error-classes.html#cf_unsupported_schema_evolution_mode), [CLASS\\_UNSUPPORTED\\_BY\\_MAP\\_OBJECTS](https://docs.databricks.com/error-messages/error-classes.html#class_unsupported_by_map_objects), [CLEANROOM\\_COMMANDS\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#cleanroom_commands_not_supported), [COLUMN\\_MASKS\\_CHECK\\_CONSTRAINT\\_UNSUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#column_masks_check_constraint_unsupported), [COLUMN\\_MASKS\\_FEATURE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/column-masks-feature-not-supported-error-class.html), [COLUMN\\_MASKS\\_INCOMPATIBLE\\_SCHEMA\\_CHANGE](https://docs.databricks.com/error-messages/error-classes.html#column_masks_incompatible_schema_change), [COLUMN\\_MASKS\\_MERGE\\_UNSUPPORTED\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#column_masks_merge_unsupported_source), [COLUMN\\_MASKS\\_MERGE\\_UNSUPPORTED\\_TARGET](https://docs.databricks.com/error-messages/error-classes.html#column_masks_merge_unsupported_target), [COLUMN\\_MASKS\\_REQUIRE\\_UNITY\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#column_masks_require_unity_catalog), [COLUMN\\_MASKS\\_TABLE\\_CLONE\\_SOURCE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#column_masks_table_clone_source_not_supported), [COLUMN\\_MASKS\\_TABLE\\_CLONE\\_TARGET\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#column_masks_table_clone_target_not_supported), [COLUMN\\_MASKS\\_UNSUPPORTED\\_PROVIDER](https://docs.databricks.com/error-messages/error-classes.html#column_masks_unsupported_provider), [COLUMN\\_MASKS\\_UNSUPPORTED\\_SUBQUERY](https://docs.databricks.com/error-messages/error-classes.html#column_masks_unsupported_subquery), [CONCURRENT\\_QUERY](https://docs.databricks.com/error-messages/error-classes.html#concurrent_query), [CONSTRAINTS\\_REQUIRE\\_UNITY\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#constraints_require_unity_catalog), [COPY\\_INTO\\_CREDENTIALS\\_NOT\\_ALLOWED\\_ON](https://docs.databricks.com/error-messages/error-classes.html#copy_into_credentials_not_allowed_on), [COPY\\_INTO\\_ENCRYPTION\\_NOT\\_ALLOWED\\_ON](https://docs.databricks.com/error-messages/error-classes.html#copy_into_encryption_not_allowed_on), [COPY\\_INTO\\_ENCRYPTION\\_NOT\\_SUPPORTED\\_FOR\\_AZURE](https://docs.databricks.com/error-messages/error-classes.html#copy_into_encryption_not_supported_for_azure), [COPY\\_INTO\\_SOURCE\\_FILE\\_FORMAT\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#copy_into_source_file_format_not_supported), [CREATE\\_OR\\_REFRESH\\_MV\\_ST\\_ASYNC](https://docs.databricks.com/error-messages/error-classes.html#create_or_refresh_mv_st_async), [CREATE\\_PERMANENT\\_VIEW\\_WITHOUT\\_ALIAS](https://docs.databricks.com/error-messages/error-classes.html#create_permanent_view_without_alias), [CSV\\_ENFORCE\\_SCHEMA\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#csv_enforce_schema_not_supported), [DC\\_FEATURE\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#dc_feature_not_enabled), [DC\\_UNSUPPORTED\\_ERROR](https://docs.databricks.com/error-messages/dc-unsupported-error-error-class.html), [DELTA\\_ADDING\\_DELETION\\_VECTORS\\_DISALLOWED](https://docs.databricks.com/error-messages/error-classes.html#delta_adding_deletion_vectors_disallowed), [DELTA\\_ADD\\_CONSTRAINTS](https://docs.databricks.com/error-messages/error-classes.html#delta_add_constraints), [DELTA\\_CANNOT\\_WRITE\\_INTO\\_VIEW](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_write_into_view), [DELTA\\_CLUSTERING\\_CLONE\\_TABLE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_clone_table_not_supported), [DELTA\\_CLUSTERING\\_SHOW\\_CREATE\\_TABLE\\_WITHOUT\\_CLUSTERING\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_show_create_table_without_clustering_columns), [DELTA\\_CLUSTERING\\_WITH\\_PARTITION\\_PREDICATE](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_with_partition_predicate), [DELTA\\_DOMAIN\\_METADATA\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_domain_metadata_not_supported), [DELTA\\_DYNAMIC\\_PARTITION\\_OVERWRITE\\_DISABLED](https://docs.databricks.com/error-messages/error-classes.html#delta_dynamic_partition_overwrite_disabled), [DELTA\\_NESTED\\_SUBQUERY\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_nested_subquery_not_supported), [DELTA\\_NOT\\_NULL\\_NESTED\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#delta_not_null_nested_field), [DELTA\\_OPERATION\\_ON\\_TEMP\\_VIEW\\_WITH\\_GENERATED\\_COLS\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_operation_on_temp_view_with_generated_cols_not_supported), [DELTA\\_SOURCE\\_IGNORE\\_DELETE](https://docs.databricks.com/error-messages/error-classes.html#delta_source_ignore_delete), [DELTA\\_SOURCE\\_TABLE\\_IGNORE\\_CHANGES](https://docs.databricks.com/error-messages/error-classes.html#delta_source_table_ignore_changes), [DELTA\\_UNSUPPORTED\\_DEEP\\_CLONE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_deep_clone), [DELTA\\_UNSUPPORTED\\_EXPRESSION](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_expression), [DELTA\\_UNSUPPORTED\\_FSCK\\_WITH\\_DELETION\\_VECTORS](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_fsck_with_deletion_vectors), [DELTA\\_UNSUPPORTED\\_GENERATE\\_WITH\\_DELETION\\_VECTORS](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_generate_with_deletion_vectors), [DELTA\\_UNSUPPORTED\\_LIST\\_KEYS\\_WITH\\_PREFIX](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_list_keys_with_prefix), [DELTA\\_UNSUPPORTED\\_MERGE\\_SCHEMA\\_EVOLUTION\\_WITH\\_CDC](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_merge_schema_evolution_with_cdc), [DELTA\\_UNSUPPORTED\\_SORT\\_ON\\_BUCKETED\\_TABLES](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_sort_on_bucketed_tables), [DELTA\\_UNSUPPORTED\\_TRUNCATE\\_SAMPLE\\_TABLES](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_truncate_sample_tables), [DELTA\\_UNSUPPORTED\\_WRITE\\_SAMPLE\\_TABLES](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_write_sample_tables), [DELTA\\_VIOLATE\\_TABLE\\_PROPERTY\\_VALIDATION\\_FAILED](https://docs.databricks.com/error-messages/delta-violate-table-property-validation-failed-error-class.html), [DELTA\\_WRITE\\_INTO\\_VIEW\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_write_into_view_not_supported), [DISTINCT\\_WINDOW\\_FUNCTION\\_UNSUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#distinct_window_function_unsupported), [EXTERNAL\\_TABLE\\_INVALID\\_SCHEME](https://docs.databricks.com/error-messages/error-classes.html#external_table_invalid_scheme), [FABRIC\\_REFRESH\\_INVALID\\_SCOPE](https://docs.databricks.com/error-messages/error-classes.html#fabric_refresh_invalid_scope), [FROM\\_JSON\\_INFERENCE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#from_json_inference_not_supported), [H3\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/h3-not-enabled-error-class.html), [INFINITE\\_STREAMING\\_TRIGGER\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#infinite_streaming_trigger_not_supported), [INVALID\\_PANDAS\\_UDF\\_PLACEMENT](https://docs.databricks.com/error-messages/error-classes.html#invalid_pandas_udf_placement), [MATERIALIZED\\_VIEW\\_OUTPUT\\_WITHOUT\\_EXPLICIT\\_ALIAS](https://docs.databricks.com/error-messages/error-classes.html#materialized_view_output_without_explicit_alias), [MATERIALIZED\\_VIEW\\_UNSUPPORTED\\_OPERATION](https://docs.databricks.com/error-messages/error-classes.html#materialized_view_unsupported_operation), [MULTI\\_UDF\\_INTERFACE\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#multi_udf_interface_error), [NAMED\\_PARAMETERS\\_NOT\\_SUPPORTED\\_FOR\\_SQL\\_UDFS](https://docs.databricks.com/error-messages/error-classes.html#named_parameters_not_supported_for_sql_udfs), [NAMED\\_PARAMETER\\_SUPPORT\\_DISABLED](https://docs.databricks.com/error-messages/error-classes.html#named_parameter_support_disabled), [NOT\\_SUPPORTED\\_CHANGE\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#not_supported_change_column), [NOT\\_SUPPORTED\\_COMMAND\\_FOR\\_V2\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#not_supported_command_for_v2_table), [NOT\\_SUPPORTED\\_COMMAND\\_WITHOUT\\_HIVE\\_SUPPORT](https://docs.databricks.com/error-messages/error-classes.html#not_supported_command_without_hive_support), [NOT\\_SUPPORTED\\_IN\\_JDBC\\_CATALOG](https://docs.databricks.com/error-messages/not-supported-in-jdbc-catalog-error-class.html), [NOT\\_SUPPORTED\\_WITH\\_DB\\_SQL](https://docs.databricks.com/error-messages/error-classes.html#not_supported_with_db_sql), [PROTOBUF\\_JAVA\\_CLASSES\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#protobuf_java_classes_not_supported), [PS\\_SPARK\\_SPECULATION\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#ps_spark_speculation_not_supported), [PS\\_UNSUPPORTED\\_GET\\_OFFSET\\_CALL](https://docs.databricks.com/error-messages/error-classes.html#ps_unsupported_get_offset_call), [RESTRICTED\\_STREAMING\\_OPTION\\_PERMISSION\\_ENFORCED](https://docs.databricks.com/error-messages/restricted-streaming-option-permission-enforced-error-class.html), [ROW\\_LEVEL\\_SECURITY\\_CHECK\\_CONSTRAINT\\_UNSUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_check_constraint_unsupported), [ROW\\_LEVEL\\_SECURITY\\_FEATURE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/row-level-security-feature-not-supported-error-class.html), [ROW\\_LEVEL\\_SECURITY\\_INCOMPATIBLE\\_SCHEMA\\_CHANGE](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_incompatible_schema_change), [ROW\\_LEVEL\\_SECURITY\\_MERGE\\_UNSUPPORTED\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_merge_unsupported_source), [ROW\\_LEVEL\\_SECURITY\\_MERGE\\_UNSUPPORTED\\_TARGET](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_merge_unsupported_target), [ROW\\_LEVEL\\_SECURITY\\_REQUIRE\\_UNITY\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_require_unity_catalog), [ROW\\_LEVEL\\_SECURITY\\_TABLE\\_CLONE\\_SOURCE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_table_clone_source_not_supported), [ROW\\_LEVEL\\_SECURITY\\_TABLE\\_CLONE\\_TARGET\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_table_clone_target_not_supported), [ROW\\_LEVEL\\_SECURITY\\_UNSUPPORTED\\_PROVIDER](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_unsupported_provider), [SCALAR\\_SUBQUERY\\_IS\\_IN\\_GROUP\\_BY\\_OR\\_AGGREGATE\\_FUNCTION](https://docs.databricks.com/error-messages/error-classes.html#scalar_subquery_is_in_group_by_or_aggregate_function), [STAR\\_GROUP\\_BY\\_POS](https://docs.databricks.com/error-messages/error-classes.html#star_group_by_pos), [STORED\\_PROCEDURE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#stored_procedure_not_supported), [STREAMING\\_AQE\\_NOT\\_SUPPORTED\\_FOR\\_STATEFUL\\_OPERATORS](https://docs.databricks.com/error-messages/error-classes.html#streaming_aqe_not_supported_for_stateful_operators), [STREAMING\\_FROM\\_MATERIALIZED\\_VIEW](https://docs.databricks.com/error-messages/error-classes.html#streaming_from_materialized_view), [ST\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#st_not_enabled), [ST\\_UNSUPPORTED\\_RETURN\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#st_unsupported_return_type), [TABLE\\_VALUED\\_ARGUMENTS\\_NOT\\_YET\\_IMPLEMENTED\\_FOR\\_SQL\\_FUNCTIONS](https://docs.databricks.com/error-messages/error-classes.html#table_valued_arguments_not_yet_implemented_for_sql_functions), [UC\\_HIVE\\_METASTORE\\_FEDERATION\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#uc_hive_metastore_federation_not_enabled), [UDF\\_PYSPARK\\_UNSUPPORTED\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#udf_pyspark_unsupported_type), [UDF\\_UNSUPPORTED\\_PARAMETER\\_DEFAULT\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#udf_unsupported_parameter_default_value), [UDTF\\_PYSPARK\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#udtf_pyspark_not_supported), [UNSUPPORTED\\_ADD\\_FILE](https://docs.databricks.com/error-messages/unsupported-add-file-error-class.html), [UNSUPPORTED\\_ARROWTYPE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_arrowtype), [UNSUPPORTED\\_CHAR\\_OR\\_VARCHAR\\_AS\\_STRING](https://docs.databricks.com/error-messages/error-classes.html#unsupported_char_or_varchar_as_string), [UNSUPPORTED\\_CLAUSE\\_FOR\\_OPERATION](https://docs.databricks.com/error-messages/error-classes.html#unsupported_clause_for_operation), [UNSUPPORTED\\_CONSTRAINT\\_CLAUSES](https://docs.databricks.com/error-messages/error-classes.html#unsupported_constraint_clauses), [UNSUPPORTED\\_DATASOURCE\\_FOR\\_DIRECT\\_QUERY](https://docs.databricks.com/error-messages/error-classes.html#unsupported_datasource_for_direct_query), [UNSUPPORTED\\_DATATYPE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_datatype), [UNSUPPORTED\\_DATA\\_SOURCE\\_SAVE\\_MODE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_data_source_save_mode), [UNSUPPORTED\\_DATA\\_TYPE\\_FOR\\_DATASOURCE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_data_type_for_datasource), [UNSUPPORTED\\_DEFAULT\\_VALUE](https://docs.databricks.com/error-messages/unsupported-default-value-error-class.html), [UNSUPPORTED\\_DESERIALIZER](https://docs.databricks.com/error-messages/unsupported-deserializer-error-class.html), [UNSUPPORTED\\_FEATURE](https://docs.databricks.com/error-messages/unsupported-feature-error-class.html), [UNSUPPORTED\\_FN\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_fn_type), [UNSUPPORTED\\_NESTED\\_ROW\\_OR\\_COLUMN\\_ACCESS\\_POLICY](https://docs.databricks.com/error-messages/error-classes.html#unsupported_nested_row_or_column_access_policy), [UNSUPPORTED\\_SAVE\\_MODE](https://docs.databricks.com/error-messages/unsupported-save-mode-error-class.html), [UNSUPPORTED\\_STREAMING\\_OPTIONS\\_FOR\\_VIEW](https://docs.databricks.com/error-messages/unsupported-streaming-options-for-view-error-class.html), [UNSUPPORTED\\_STREAMING\\_OPTIONS\\_PERMISSION\\_ENFORCED](https://docs.databricks.com/error-messages/error-classes.html#unsupported_streaming_options_permission_enforced), [UNSUPPORTED\\_STREAMING\\_SINK\\_PERMISSION\\_ENFORCED](https://docs.databricks.com/error-messages/error-classes.html#unsupported_streaming_sink_permission_enforced), [UNSUPPORTED\\_STREAMING\\_SOURCE\\_PERMISSION\\_ENFORCED](https://docs.databricks.com/error-messages/error-classes.html#unsupported_streaming_source_permission_enforced), [UNSUPPORTED\\_STREAM\\_READ\\_LIMIT\\_FOR\\_KINESIS\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_stream_read_limit_for_kinesis_source), [UNSUPPORTED\\_SUBQUERY\\_EXPRESSION\\_CATEGORY](https://docs.databricks.com/error-messages/unsupported-subquery-expression-category-error-class.html), [UNSUPPORTED\\_TIMESERIES\\_WITH\\_MORE\\_THAN\\_ONE\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#unsupported_timeseries_with_more_than_one_column), [UNSUPPORTED\\_TRIGGER\\_FOR\\_KINESIS\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_trigger_for_kinesis_source), [UNSUPPORTED\\_TYPED\\_LITERAL](https://docs.databricks.com/error-messages/error-classes.html#unsupported_typed_literal) |\n| `0AKD0` | Cross catalog or schema operation not supported |\n| | [CANNOT\\_COPY\\_STATE](https://docs.databricks.com/error-messages/error-classes.html#cannot_copy_state), [CANNOT\\_REFERENCE\\_UC\\_IN\\_HMS](https://docs.databricks.com/error-messages/error-classes.html#cannot_reference_uc_in_hms), [CANNOT\\_RENAME\\_ACROSS\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#cannot_rename_across_catalog), [CANNOT\\_RENAME\\_ACROSS\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#cannot_rename_across_schema), [CANNOT\\_SHALLOW\\_CLONE\\_ACROSS\\_UC\\_AND\\_HMS](https://docs.databricks.com/error-messages/error-classes.html#cannot_shallow_clone_across_uc_and_hms) |\n| `0AKD1` | Security feature not supported |\n| | [COLUMN\\_MASKS\\_UNSUPPORTED\\_CONSTANT\\_AS\\_PARAMETER](https://docs.databricks.com/error-messages/error-classes.html#column_masks_unsupported_constant_as_parameter), [ROW\\_LEVEL\\_SECURITY\\_UNSUPPORTED\\_CONSTANT\\_AS\\_PARAMETER](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_unsupported_constant_as_parameter) |\n| `0AKDC` | Not supported in Delta |\n| | [DELTA\\_CANNOT\\_EVALUATE\\_EXPRESSION](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_evaluate_expression), [DELTA\\_CANNOT\\_GENERATE\\_CODE\\_FOR\\_EXPRESSION](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_generate_code_for_expression), [DELTA\\_CDC\\_NOT\\_ALLOWED\\_IN\\_THIS\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#delta_cdc_not_allowed_in_this_version), [DELTA\\_CHANGE\\_DATA\\_FEED\\_INCOMPATIBLE\\_DATA\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#delta_change_data_feed_incompatible_data_schema), [DELTA\\_CHANGE\\_DATA\\_FEED\\_INCOMPATIBLE\\_SCHEMA\\_CHANGE](https://docs.databricks.com/error-messages/error-classes.html#delta_change_data_feed_incompatible_schema_change), [DELTA\\_CLONE\\_UNSUPPORTED\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#delta_clone_unsupported_source), [DELTA\\_COLUMN\\_DATA\\_SKIPPING\\_NOT\\_SUPPORTED\\_PARTITIONED\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_column_data_skipping_not_supported_partitioned_column), [DELTA\\_COLUMN\\_DATA\\_SKIPPING\\_NOT\\_SUPPORTED\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#delta_column_data_skipping_not_supported_type), [DELTA\\_CONVERSION\\_UNSUPPORTED\\_COLUMN\\_MAPPING](https://docs.databricks.com/error-messages/error-classes.html#delta_conversion_unsupported_column_mapping), [DELTA\\_CONVERT\\_NON\\_PARQUET\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_convert_non_parquet_table), [DELTA\\_INCORRECT\\_LOG\\_STORE\\_IMPLEMENTATION](https://docs.databricks.com/error-messages/error-classes.html#delta_incorrect_log_store_implementation), [DELTA\\_MISSING\\_PROVIDER\\_FOR\\_CONVERT](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_provider_for_convert), [DELTA\\_MODE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_mode_not_supported), [DELTA\\_NESTED\\_NOT\\_NULL\\_CONSTRAINT](https://docs.databricks.com/error-messages/error-classes.html#delta_nested_not_null_constraint), [DELTA\\_NON\\_DETERMINISTIC\\_FUNCTION\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_non_deterministic_function_not_supported), [DELTA\\_OPERATION\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#delta_operation_not_allowed), [DELTA\\_OPERATION\\_NOT\\_ALLOWED\\_DETAIL](https://docs.databricks.com/error-messages/error-classes.html#delta_operation_not_allowed_detail), [DELTA\\_TABLE\\_FOR\\_PATH\\_UNSUPPORTED\\_HADOOP\\_CONF](https://docs.databricks.com/error-messages/error-classes.html#delta_table_for_path_unsupported_hadoop_conf), [DELTA\\_TRUNCATE\\_TABLE\\_PARTITION\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_truncate_table_partition_not_supported), [DELTA\\_UNIFORM\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_uniform_not_supported), [DELTA\\_UNSUPPORTED\\_ABS\\_PATH\\_ADD\\_FILE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_abs_path_add_file), [DELTA\\_UNSUPPORTED\\_ALTER\\_TABLE\\_CHANGE\\_COL\\_OP](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_alter_table_change_col_op), [DELTA\\_UNSUPPORTED\\_ALTER\\_TABLE\\_REPLACE\\_COL\\_OP](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_alter_table_replace_col_op), [DELTA\\_UNSUPPORTED\\_CLONE\\_REPLACE\\_SAME\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_clone_replace_same_table), [DELTA\\_UNSUPPORTED\\_COLUMN\\_MAPPING\\_MODE\\_CHANGE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_column_mapping_mode_change), [DELTA\\_UNSUPPORTED\\_COLUMN\\_MAPPING\\_SCHEMA\\_CHANGE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_column_mapping_schema_change), [DELTA\\_UNSUPPORTED\\_COLUMN\\_MAPPING\\_WRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_column_mapping_write), [DELTA\\_UNSUPPORTED\\_COLUMN\\_TYPE\\_IN\\_BLOOM\\_FILTER](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_column_type_in_bloom_filter), [DELTA\\_UNSUPPORTED\\_COMMENT\\_MAP\\_ARRAY](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_comment_map_array), [DELTA\\_UNSUPPORTED\\_DATA\\_TYPES](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_data_types), [DELTA\\_UNSUPPORTED\\_DROP\\_CLUSTERING\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_drop_clustering_column), [DELTA\\_UNSUPPORTED\\_DROP\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_drop_column), [DELTA\\_UNSUPPORTED\\_DROP\\_NESTED\\_COLUMN\\_FROM\\_NON\\_STRUCT\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_drop_nested_column_from_non_struct_type), [DELTA\\_UNSUPPORTED\\_DROP\\_PARTITION\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_drop_partition_column), [DELTA\\_UNSUPPORTED\\_FIELD\\_UPDATE\\_NON\\_STRUCT](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_field_update_non_struct), [DELTA\\_UNSUPPORTED\\_INVARIANT\\_NON\\_STRUCT](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_invariant_non_struct), [DELTA\\_UNSUPPORTED\\_IN\\_SUBQUERY](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_in_subquery), [DELTA\\_UNSUPPORTED\\_MANIFEST\\_GENERATION\\_WITH\\_COLUMN\\_MAPPING](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_manifest_generation_with_column_mapping), [DELTA\\_UNSUPPORTED\\_MULTI\\_COL\\_IN\\_PREDICATE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_multi_col_in_predicate), [DELTA\\_UNSUPPORTED\\_NESTED\\_COLUMN\\_IN\\_BLOOM\\_FILTER](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_nested_column_in_bloom_filter), [DELTA\\_UNSUPPORTED\\_NESTED\\_FIELD\\_IN\\_OPERATION](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_nested_field_in_operation), [DELTA\\_UNSUPPORTED\\_NON\\_EMPTY\\_CLONE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_non_empty_clone), [DELTA\\_UNSUPPORTED\\_OUTPUT\\_MODE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_output_mode), [DELTA\\_UNSUPPORTED\\_PARTITION\\_COLUMN\\_IN\\_BLOOM\\_FILTER](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_partition_column_in_bloom_filter), [DELTA\\_UNSUPPORTED\\_RENAME\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_rename_column), [DELTA\\_UNSUPPORTED\\_SCHEMA\\_DURING\\_READ](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_schema_during_read), [DELTA\\_UNSUPPORTED\\_SUBQUERY](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_subquery), [DELTA\\_UNSUPPORTED\\_SUBQUERY\\_IN\\_PARTITION\\_PREDICATES](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_subquery_in_partition_predicates), [DELTA\\_UNSUPPORTED\\_TIME\\_TRAVEL\\_VIEWS](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_time_travel_views), [DELTA\\_UNSUPPORTED\\_VACUUM\\_SPECIFIC\\_PARTITION](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_vacuum_specific_partition), [WRONG\\_COLUMN\\_DEFAULTS\\_FOR\\_DELTA\\_ALTER\\_TABLE\\_ADD\\_COLUMN\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#wrong_column_defaults_for_delta_alter_table_add_column_not_supported) |\n| `0AKDD` | Feature requires Delta |\n| | [DELTA\\_COPY\\_INTO\\_TARGET\\_FORMAT](https://docs.databricks.com/error-messages/error-classes.html#delta_copy_into_target_format), [DELTA\\_NOT\\_A\\_DELTA\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_not_a_delta_table), [DELTA\\_ONLY\\_OPERATION](https://docs.databricks.com/error-messages/error-classes.html#delta_only_operation), [DELTA\\_TABLE\\_ONLY\\_OPERATION](https://docs.databricks.com/error-messages/error-classes.html#delta_table_only_operation), [DELTA\\_UNSUPPORTED\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_source), [DELTA\\_UNSUPPORTED\\_STATIC\\_PARTITIONS](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_static_partitions), [SYNC\\_METADATA\\_DELTA\\_ONLY](https://docs.databricks.com/error-messages/error-classes.html#sync_metadata_delta_only), [UNSUPPORTED\\_MANAGED\\_TABLE\\_CREATION](https://docs.databricks.com/error-messages/error-classes.html#unsupported_managed_table_creation) |\n| `0AKDE` | Feature non enabled for this Delta table |\n| | [DELTA\\_CLUSTERING\\_PHASE\\_OUT\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_phase_out_failed), [DELTA\\_DATA\\_CHANGE\\_FALSE](https://docs.databricks.com/error-messages/error-classes.html#delta_data_change_false), [DELTA\\_FEATURES\\_PROTOCOL\\_METADATA\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_features_protocol_metadata_mismatch), [DELTA\\_FEATURES\\_REQUIRE\\_MANUAL\\_ENABLEMENT](https://docs.databricks.com/error-messages/error-classes.html#delta_features_require_manual_enablement), [DELTA\\_FEATURE\\_DROP\\_CONFLICT\\_REVALIDATION\\_FAIL](https://docs.databricks.com/error-messages/error-classes.html#delta_feature_drop_conflict_revalidation_fail), [DELTA\\_FEATURE\\_DROP\\_FEATURE\\_NOT\\_PRESENT](https://docs.databricks.com/error-messages/error-classes.html#delta_feature_drop_feature_not_present), [DELTA\\_FEATURE\\_DROP\\_HISTORICAL\\_VERSIONS\\_EXIST](https://docs.databricks.com/error-messages/error-classes.html#delta_feature_drop_historical_versions_exist), [DELTA\\_FEATURE\\_DROP\\_HISTORY\\_TRUNCATION\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#delta_feature_drop_history_truncation_not_allowed), [DELTA\\_FEATURE\\_DROP\\_NONREMOVABLE\\_FEATURE](https://docs.databricks.com/error-messages/error-classes.html#delta_feature_drop_nonremovable_feature), [DELTA\\_FEATURE\\_DROP\\_UNSUPPORTED\\_CLIENT\\_FEATURE](https://docs.databricks.com/error-messages/error-classes.html#delta_feature_drop_unsupported_client_feature), [DELTA\\_FEATURE\\_DROP\\_WAIT\\_FOR\\_RETENTION\\_PERIOD](https://docs.databricks.com/error-messages/error-classes.html#delta_feature_drop_wait_for_retention_period), [DELTA\\_FEATURE\\_REQUIRES\\_HIGHER\\_READER\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#delta_feature_requires_higher_reader_version), [DELTA\\_FEATURE\\_REQUIRES\\_HIGHER\\_WRITER\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#delta_feature_requires_higher_writer_version), [DELTA\\_UNSUPPORTED\\_FEATURE\\_STATUS](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_feature_status), [WRONG\\_COLUMN\\_DEFAULTS\\_FOR\\_DELTA\\_FEATURE\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#wrong_column_defaults_for_delta_feature_not_enabled) |\n| `0AKUC` | Not supported in Unity Catalog |\n| | [CANNOT\\_SHALLOW\\_CLONE\\_NESTED](https://docs.databricks.com/error-messages/error-classes.html#cannot_shallow_clone_nested), [CANNOT\\_SHALLOW\\_CLONE\\_NON\\_UC\\_MANAGED\\_TABLE\\_AS\\_SOURCE\\_OR\\_TARGET](https://docs.databricks.com/error-messages/error-classes.html#cannot_shallow_clone_non_uc_managed_table_as_source_or_target), [INVALID\\_SCHEME](https://docs.databricks.com/error-messages/error-classes.html#invalid_scheme), [PARTITION\\_METADATA](https://docs.databricks.com/error-messages/error-classes.html#partition_metadata), [UC\\_BUCKETED\\_TABLES](https://docs.databricks.com/error-messages/error-classes.html#uc_bucketed_tables), [UC\\_COMMAND\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/uc-command-not-supported-error-class.html), [UC\\_DATASOURCE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#uc_datasource_not_supported), [UC\\_DATASOURCE\\_OPTIONS\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#uc_datasource_options_not_supported), [UC\\_FILE\\_SCHEME\\_FOR\\_TABLE\\_CREATION\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#uc_file_scheme_for_table_creation_not_supported), [UC\\_INVALID\\_NAMESPACE](https://docs.databricks.com/error-messages/error-classes.html#uc_invalid_namespace), [UC\\_INVALID\\_REFERENCE](https://docs.databricks.com/error-messages/error-classes.html#uc_invalid_reference), [UPGRADE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/upgrade-not-supported-error-class.html) |\n| `0AKUD` | Feature requires Unity Catalog |\n| | [FEATURE\\_REQUIRES\\_UC](https://docs.databricks.com/error-messages/error-classes.html#feature_requires_uc), [OPERATION\\_REQUIRES\\_UNITY\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#operation_requires_unity_catalog), [SYNC\\_METADATA\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#sync_metadata_not_supported) |\n\n", "chunk_id": "ac7874ba5681dfcd61540a293025dfb9", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `0B`: invalid transaction initiation\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `0B000` | invalid transaction initiation |\n| | [DELTA\\_ACTIVE\\_TRANSACTION\\_ALREADY\\_SET](https://docs.databricks.com/error-messages/error-classes.html#delta_active_transaction_already_set) |\n\n#### SQLSTATE error codes\n##### Class `0N`: SQL/XML mapping error\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `0N000` | SQL/XML mapping error |\n| | [XML\\_UNSUPPORTED\\_NESTED\\_TYPES](https://docs.databricks.com/error-messages/error-classes.html#xml_unsupported_nested_types) |\n\n", "chunk_id": "649e61f715aa2d8f1e3baa7e5c0f2ff7", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `21`: cardinality violation\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `21000` | cardinality violation |\n| | [ROW\\_SUBQUERY\\_TOO\\_MANY\\_ROWS](https://docs.databricks.com/error-messages/error-classes.html#row_subquery_too_many_rows), [SCALAR\\_SUBQUERY\\_TOO\\_MANY\\_ROWS](https://docs.databricks.com/error-messages/error-classes.html#scalar_subquery_too_many_rows) |\n| `21506` | The same row of a table cannot be the target for more than one of an update, delete or insert operation. |\n| | [DELTA\\_MULTIPLE\\_SOURCE\\_ROW\\_MATCHING\\_TARGET\\_ROW\\_IN\\_MERGE](https://docs.databricks.com/error-messages/error-classes.html#delta_multiple_source_row_matching_target_row_in_merge) |\n| `21S01` | Insert value list does not match column list |\n| | [CREATE\\_VIEW\\_COLUMN\\_ARITY\\_MISMATCH](https://docs.databricks.com/error-messages/create-view-column-arity-mismatch-error-class.html), [INSERT\\_COLUMN\\_ARITY\\_MISMATCH](https://docs.databricks.com/error-messages/insert-column-arity-mismatch-error-class.html), [INSERT\\_PARTITION\\_COLUMN\\_ARITY\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#insert_partition_column_arity_mismatch) |\n\n", "chunk_id": "e706be1eaf6028e5b6b1f185af066774", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `22`: data exception\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `22000` | data exception |\n| | [AI\\_FUNCTION\\_JSON\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#ai_function_json_parse_error), [CF\\_BUCKET\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#cf_bucket_mismatch), [CF\\_CANNOT\\_EVOLVE\\_SCHEMA\\_LOG\\_EMPTY](https://docs.databricks.com/error-messages/error-classes.html#cf_cannot_evolve_schema_log_empty), [CF\\_CANNOT\\_PARSE\\_QUEUE\\_MESSAGE](https://docs.databricks.com/error-messages/error-classes.html#cf_cannot_parse_queue_message), [CF\\_CANNOT\\_RESOLVE\\_CONTAINER\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#cf_cannot_resolve_container_name), [CF\\_CANNOT\\_RUN\\_DIRECTORY\\_LISTING](https://docs.databricks.com/error-messages/error-classes.html#cf_cannot_run_directory_listing), [CF\\_DUPLICATE\\_COLUMN\\_IN\\_DATA](https://docs.databricks.com/error-messages/error-classes.html#cf_duplicate_column_in_data), [CF\\_EVENT\\_GRID\\_AUTH\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_event_grid_auth_error), [CF\\_EVENT\\_GRID\\_CREATION\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#cf_event_grid_creation_failed), [CF\\_EVENT\\_GRID\\_NOT\\_FOUND\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_event_grid_not_found_error), [CF\\_FAILED\\_TO\\_CHECK\\_STREAM\\_NEW](https://docs.databricks.com/error-messages/error-classes.html#cf_failed_to_check_stream_new), [CF\\_FAILED\\_TO\\_CREATED\\_PUBSUB\\_SUBSCRIPTION](https://docs.databricks.com/error-messages/error-classes.html#cf_failed_to_created_pubsub_subscription), [CF\\_FAILED\\_TO\\_CREATED\\_PUBSUB\\_TOPIC](https://docs.databricks.com/error-messages/error-classes.html#cf_failed_to_created_pubsub_topic), [CF\\_FAILED\\_TO\\_DELETE\\_GCP\\_NOTIFICATION](https://docs.databricks.com/error-messages/error-classes.html#cf_failed_to_delete_gcp_notification), [CF\\_FAILED\\_TO\\_DESERIALIZE\\_PERSISTED\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#cf_failed_to_deserialize_persisted_schema), [CF\\_FAILED\\_TO\\_EVOLVE\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#cf_failed_to_evolve_schema), [CF\\_FAILED\\_TO\\_INFER\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#cf_failed_to_infer_schema), [CF\\_FAILED\\_TO\\_WRITE\\_TO\\_SCHEMA\\_LOG](https://docs.databricks.com/error-messages/error-classes.html#cf_failed_to_write_to_schema_log), [CF\\_FOUND\\_MULTIPLE\\_AUTOLOADER\\_PUBSUB\\_SUBSCRIPTIONS](https://docs.databricks.com/error-messages/error-classes.html#cf_found_multiple_autoloader_pubsub_subscriptions), [CF\\_GCP\\_LABELS\\_COUNT\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#cf_gcp_labels_count_exceeded), [CF\\_GCP\\_RESOURCE\\_TAGS\\_COUNT\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#cf_gcp_resource_tags_count_exceeded), [CF\\_INCOMPLETE\\_LOG\\_FILE\\_IN\\_SCHEMA\\_LOG](https://docs.databricks.com/error-messages/error-classes.html#cf_incomplete_log_file_in_schema_log), [CF\\_INCOMPLETE\\_METADATA\\_FILE\\_IN\\_CHECKPOINT](https://docs.databricks.com/error-messages/error-classes.html#cf_incomplete_metadata_file_in_checkpoint), [CF\\_INVALID\\_MANAGED\\_FILE\\_EVENTS\\_RESPONSE](https://docs.databricks.com/error-messages/cf-invalid-managed-file-events-response-error-class.html), [CF\\_LATEST\\_OFFSET\\_READ\\_LIMIT\\_REQUIRED](https://docs.databricks.com/error-messages/error-classes.html#cf_latest_offset_read_limit_required), [CF\\_LOG\\_FILE\\_MALFORMED](https://docs.databricks.com/error-messages/error-classes.html#cf_log_file_malformed), [CF\\_MANAGED\\_FILE\\_EVENTS\\_BACKFILL\\_IN\\_PROGRESS](https://docs.databricks.com/error-messages/error-classes.html#cf_managed_file_events_backfill_in_progress), [CF\\_METADATA\\_FILE\\_CONCURRENTLY\\_USED](https://docs.databricks.com/error-messages/error-classes.html#cf_metadata_file_concurrently_used), [CF\\_MULTIPLE\\_PUBSUB\\_NOTIFICATIONS\\_FOR\\_TOPIC](https://docs.databricks.com/error-messages/error-classes.html#cf_multiple_pubsub_notifications_for_topic), [CF\\_NEW\\_PARTITION\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_new_partition_error), [CF\\_PARTITON\\_INFERENCE\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_partiton_inference_error), [CF\\_PREFIX\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#cf_prefix_mismatch), [CF\\_PROTOCOL\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#cf_protocol_mismatch), [CF\\_RESTRICTED\\_GCP\\_RESOURCE\\_TAG\\_KEY](https://docs.databricks.com/error-messages/error-classes.html#cf_restricted_gcp_resource_tag_key), [CF\\_SAME\\_PUB\\_SUB\\_TOPIC\\_NEW\\_KEY\\_PREFIX](https://docs.databricks.com/error-messages/error-classes.html#cf_same_pub_sub_topic_new_key_prefix), [CF\\_THREAD\\_IS\\_DEAD](https://docs.databricks.com/error-messages/error-classes.html#cf_thread_is_dead), [CF\\_UNABLE\\_TO\\_LIST\\_EFFICIENTLY](https://docs.databricks.com/error-messages/error-classes.html#cf_unable_to_list_efficiently), [CF\\_UNEXPECTED\\_READ\\_LIMIT](https://docs.databricks.com/error-messages/error-classes.html#cf_unexpected_read_limit), [CF\\_UNKNOWN\\_READ\\_LIMIT](https://docs.databricks.com/error-messages/error-classes.html#cf_unknown_read_limit), [DELTA\\_CANNOT\\_FIND\\_BUCKET\\_SPEC](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_find_bucket_spec), [DELTA\\_CLUSTERING\\_COLUMN\\_MISSING\\_STATS](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_column_missing_stats), [DELTA\\_CLUSTERING\\_CREATE\\_EXTERNAL\\_NON\\_LIQUID\\_TABLE\\_FROM\\_LIQUID\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_create_external_non_liquid_table_from_liquid_table), [DELTA\\_COMPACTION\\_VALIDATION\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#delta_compaction_validation_failed), [DELTA\\_CONVERT\\_TO\\_DELTA\\_ROW\\_TRACKING\\_WITHOUT\\_STATS](https://docs.databricks.com/error-messages/error-classes.html#delta_convert_to_delta_row_tracking_without_stats), [DELTA\\_DV\\_HISTOGRAM\\_DESERIALIZATON](https://docs.databricks.com/error-messages/error-classes.html#delta_dv_histogram_deserializaton), [DELTA\\_INVALID\\_FORMAT](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_format), [DELTA\\_INVALID\\_TABLE\\_VALUE\\_FUNCTION](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_table_value_function), [DELTA\\_MATERIALIZED\\_ROW\\_TRACKING\\_COLUMN\\_NAME\\_MISSING](https://docs.databricks.com/error-messages/error-classes.html#delta_materialized_row_tracking_column_name_missing), [DELTA\\_ROW\\_ID\\_ASSIGNMENT\\_WITHOUT\\_STATS](https://docs.databricks.com/error-messages/error-classes.html#delta_row_id_assignment_without_stats), [DELTA\\_STREAMING\\_METADATA\\_EVOLUTION](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_metadata_evolution), [DELTA\\_STREAMING\\_SCHEMA\\_EVOLUTION\\_UNSUPPORTED\\_ROW\\_FILTER\\_COLUMN\\_MASKS](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_schema_evolution_unsupported_row_filter_column_masks), [DELTA\\_STREAMING\\_SCHEMA\\_LOCATION\\_CONFLICT](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_schema_location_conflict), [DELTA\\_STREAMING\\_SCHEMA\\_LOCATION\\_NOT\\_UNDER\\_CHECKPOINT](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_schema_location_not_under_checkpoint), [DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_DESERIALIZE\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_schema_log_deserialize_failed), [DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_INCOMPATIBLE\\_DELTA\\_TABLE\\_ID](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_schema_log_incompatible_delta_table_id), [DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_INCOMPATIBLE\\_PARTITION\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_schema_log_incompatible_partition_schema), [DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_INIT\\_FAILED\\_INCOMPATIBLE\\_METADATA](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_schema_log_init_failed_incompatible_metadata), [DELTA\\_STREAMING\\_SCHEMA\\_LOG\\_PARSE\\_SCHEMA\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_schema_log_parse_schema_failed), [HLL\\_UNION\\_DIFFERENT\\_LG\\_K](https://docs.databricks.com/error-messages/error-classes.html#hll_union_different_lg_k), [INVALID\\_TIMESTAMP\\_FORMAT](https://docs.databricks.com/error-messages/error-classes.html#invalid_timestamp_format), [KAFKA\\_DATA\\_LOSS](https://docs.databricks.com/error-messages/kafka-data-loss-error-class.html), [KINESIS\\_COULD\\_NOT\\_READ\\_SHARD\\_UNTIL\\_END\\_OFFSET](https://docs.databricks.com/error-messages/error-classes.html#kinesis_could_not_read_shard_until_end_offset), [PS\\_FETCH\\_RETRY\\_EXCEPTION](https://docs.databricks.com/error-messages/error-classes.html#ps_fetch_retry_exception), [PS\\_INVALID\\_KEY\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#ps_invalid_key_type), [PS\\_INVALID\\_UNSAFE\\_ROW\\_CONVERSION\\_FROM\\_PROTO](https://docs.databricks.com/error-messages/error-classes.html#ps_invalid_unsafe_row_conversion_from_proto), [PS\\_MOVING\\_CHECKPOINT\\_FAILURE](https://docs.databricks.com/error-messages/error-classes.html#ps_moving_checkpoint_failure), [PS\\_MULTIPLE\\_FAILED\\_EPOCHS](https://docs.databricks.com/error-messages/error-classes.html#ps_multiple_failed_epochs), [PS\\_OPTION\\_NOT\\_IN\\_BOUNDS](https://docs.databricks.com/error-messages/error-classes.html#ps_option_not_in_bounds), [PS\\_UNABLE\\_TO\\_PARSE\\_PROTO](https://docs.databricks.com/error-messages/error-classes.html#ps_unable_to_parse_proto), [STAGING\\_PATH\\_CURRENTLY\\_INACCESSIBLE](https://docs.databricks.com/error-messages/error-classes.html#staging_path_currently_inaccessible) |\n| `22001` | string data, right truncation |\n| | [DELTA\\_EXCEED\\_CHAR\\_VARCHAR\\_LIMIT](https://docs.databricks.com/error-messages/error-classes.html#delta_exceed_char_varchar_limit) |\n| `22003` | numeric value out of range |\n| | [ARITHMETIC\\_OVERFLOW](https://docs.databricks.com/error-messages/arithmetic-overflow-error-class.html), [BINARY\\_ARITHMETIC\\_OVERFLOW](https://docs.databricks.com/error-messages/error-classes.html#binary_arithmetic_overflow), [CAST\\_OVERFLOW](https://docs.databricks.com/error-messages/error-classes.html#cast_overflow), [CAST\\_OVERFLOW\\_IN\\_TABLE\\_INSERT](https://docs.databricks.com/error-messages/error-classes.html#cast_overflow_in_table_insert), [DECIMAL\\_PRECISION\\_EXCEEDS\\_MAX\\_PRECISION](https://docs.databricks.com/error-messages/error-classes.html#decimal_precision_exceeds_max_precision), [DELTA\\_CANNOT\\_RESTORE\\_TABLE\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_restore_table_version), [DELTA\\_CANNOT\\_RESTORE\\_TIMESTAMP\\_GREATER](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_restore_timestamp_greater), [DELTA\\_CAST\\_OVERFLOW\\_IN\\_TABLE\\_WRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_cast_overflow_in_table_write), [DELTA\\_INVALID\\_CDC\\_RANGE](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_cdc_range), [INCORRECT\\_RAMP\\_UP\\_RATE](https://docs.databricks.com/error-messages/error-classes.html#incorrect_ramp_up_rate), [INVALID\\_ARRAY\\_INDEX](https://docs.databricks.com/error-messages/invalid-array-index-error-class.html), [INVALID\\_ARRAY\\_INDEX\\_IN\\_ELEMENT\\_AT](https://docs.databricks.com/error-messages/invalid-array-index-in-element-at-error-class.html), [INVALID\\_BITMAP\\_POSITION](https://docs.databricks.com/error-messages/error-classes.html#invalid_bitmap_position), [INVALID\\_BOUNDARY](https://docs.databricks.com/error-messages/invalid-boundary-error-class.html), [INVALID\\_INDEX\\_OF\\_ZERO](https://docs.databricks.com/error-messages/error-classes.html#invalid_index_of_zero), [INVALID\\_NUMERIC\\_LITERAL\\_RANGE](https://docs.databricks.com/error-messages/error-classes.html#invalid_numeric_literal_range), [NUMERIC\\_OUT\\_OF\\_SUPPORTED\\_RANGE](https://docs.databricks.com/error-messages/error-classes.html#numeric_out_of_supported_range), [NUMERIC\\_VALUE\\_OUT\\_OF\\_RANGE](https://docs.databricks.com/error-messages/error-classes.html#numeric_value_out_of_range), [SUM\\_OF\\_LIMIT\\_AND\\_OFFSET\\_EXCEEDS\\_MAX\\_INT](https://docs.databricks.com/error-messages/error-classes.html#sum_of_limit_and_offset_exceeds_max_int) |\n| `22004` | null value not allowed |\n| | [COMPARATOR\\_RETURNS\\_NULL](https://docs.databricks.com/error-messages/error-classes.html#comparator_returns_null), [TWS\\_VALUE\\_SHOULD\\_NOT\\_BE\\_NULL](https://docs.databricks.com/error-messages/error-classes.html#tws_value_should_not_be_null) |\n| `22005` | error in assignment |\n| | [DELTA\\_COMPLEX\\_TYPE\\_COLUMN\\_CONTAINS\\_NULL\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#delta_complex_type_column_contains_null_type), [DELTA\\_FAILED\\_TO\\_MERGE\\_FIELDS](https://docs.databricks.com/error-messages/error-classes.html#delta_failed_to_merge_fields), [DELTA\\_MERGE\\_UNEXPECTED\\_ASSIGNMENT\\_KEY](https://docs.databricks.com/error-messages/error-classes.html#delta_merge_unexpected_assignment_key) |\n| `22006` | invalid interval format |\n| | [CANNOT\\_PARSE\\_INTERVAL](https://docs.databricks.com/error-messages/error-classes.html#cannot_parse_interval), [DELTA\\_INVALID\\_INTERVAL](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_interval) |\n| `22007` | invalid datetime format |\n| | [CANNOT\\_PARSE\\_TIMESTAMP](https://docs.databricks.com/error-messages/error-classes.html#cannot_parse_timestamp), [DELTA\\_INVALID\\_TIMESTAMP\\_FORMAT](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_timestamp_format) |\n| `22008` | datetime field overflow |\n| | [DATETIME\\_OVERFLOW](https://docs.databricks.com/error-messages/error-classes.html#datetime_overflow) |\n| `2200E` | null value in array target |\n| | [NULL\\_MAP\\_KEY](https://docs.databricks.com/error-messages/error-classes.html#null_map_key) |\n| `2200G` | most specific type mismatch |\n| | [DELTA\\_COLUMN\\_STRUCT\\_TYPE\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_column_struct_type_mismatch) |\n| `2200P` | interval value out of range |\n| | [DELTA\\_INVALID\\_CALENDAR\\_INTERVAL\\_EMPTY](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_calendar_interval_empty) |\n| `22012` | division by zero |\n| | [DIVIDE\\_BY\\_ZERO](https://docs.databricks.com/error-messages/divide-by-zero-error-class.html), [INTERVAL\\_DIVIDED\\_BY\\_ZERO](https://docs.databricks.com/error-messages/error-classes.html#interval_divided_by_zero) |\n| `22015` | interval field overflow |\n| | [INTERVAL\\_ARITHMETIC\\_OVERFLOW](https://docs.databricks.com/error-messages/error-classes.html#interval_arithmetic_overflow) |\n| `22018` | invalid character value for cast |\n| | [CANNOT\\_PARSE\\_DECIMAL](https://docs.databricks.com/error-messages/error-classes.html#cannot_parse_decimal), [CANNOT\\_PARSE\\_PROTOBUF\\_DESCRIPTOR](https://docs.databricks.com/error-messages/error-classes.html#cannot_parse_protobuf_descriptor), [CAST\\_INVALID\\_INPUT](https://docs.databricks.com/error-messages/cast-invalid-input-error-class.html), [CONVERSION\\_INVALID\\_INPUT](https://docs.databricks.com/error-messages/error-classes.html#conversion_invalid_input), [DELTA\\_FAILED\\_CAST\\_PARTITION\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#delta_failed_cast_partition_value), [FAILED\\_PARSE\\_STRUCT\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#failed_parse_struct_type) |\n| `2201B` | invalid regular expression |\n| | [DELTA\\_REGEX\\_OPT\\_SYNTAX\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#delta_regex_opt_syntax_error) |\n| `22023` | invalid parameter value |\n| | [CONFLICTING\\_PROVIDER](https://docs.databricks.com/error-messages/error-classes.html#conflicting_provider), [DELTA\\_INVALID\\_AUTO\\_COMPACT\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_auto_compact_type), [DELTA\\_INVALID\\_BUCKET\\_COUNT](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_bucket_count), [DELTA\\_INVALID\\_BUCKET\\_INDEX](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_bucket_index), [DELTA\\_UNSUPPORTED\\_STRATEGY\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_strategy_name), [EMPTY\\_LOCAL\\_FILE\\_IN\\_STAGING\\_ACCESS\\_QUERY](https://docs.databricks.com/error-messages/error-classes.html#empty_local_file_in_staging_access_query), [EWKB\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#ewkb_parse_error), [GEOJSON\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/geojson-parse-error-error-class.html), [H3\\_INVALID\\_CELL\\_ID](https://docs.databricks.com/error-messages/h3-invalid-cell-id-error-class.html), [H3\\_INVALID\\_GRID\\_DISTANCE\\_VALUE](https://docs.databricks.com/error-messages/h3-invalid-grid-distance-value-error-class.html), [H3\\_INVALID\\_RESOLUTION\\_VALUE](https://docs.databricks.com/error-messages/h3-invalid-resolution-value-error-class.html), [H3\\_PENTAGON\\_ENCOUNTERED\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#h3_pentagon_encountered_error), [H3\\_UNDEFINED\\_GRID\\_DISTANCE](https://docs.databricks.com/error-messages/error-classes.html#h3_undefined_grid_distance), [INVALID\\_FRACTION\\_OF\\_SECOND](https://docs.databricks.com/error-messages/error-classes.html#invalid_fraction_of_second), [INVALID\\_PARAMETER\\_MARKER\\_VALUE](https://docs.databricks.com/error-messages/invalid-parameter-marker-value-error-class.html), [INVALID\\_PARAMETER\\_VALUE](https://docs.databricks.com/error-messages/invalid-parameter-value-error-class.html), [MALFORMED\\_RECORD\\_IN\\_PARSING](https://docs.databricks.com/error-messages/malformed-record-in-parsing-error-class.html), [MALFORMED\\_VARIANT](https://docs.databricks.com/error-messages/error-classes.html#malformed_variant), [MAX\\_RECORDS\\_PER\\_FETCH\\_INVALID\\_FOR\\_KINESIS\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#max_records_per_fetch_invalid_for_kinesis_source), [RULE\\_ID\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#rule_id_not_found), [SECOND\\_FUNCTION\\_ARGUMENT\\_NOT\\_INTEGER](https://docs.databricks.com/error-messages/error-classes.html#second_function_argument_not_integer), [ST\\_DIFFERENT\\_SRID\\_VALUES](https://docs.databricks.com/error-messages/error-classes.html#st_different_srid_values), [ST\\_INVALID\\_ARGUMENT](https://docs.databricks.com/error-messages/error-classes.html#st_invalid_argument), [ST\\_INVALID\\_ARGUMENT\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#st_invalid_argument_type), [ST\\_INVALID\\_CRS\\_TRANSFORMATION\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#st_invalid_crs_transformation_error), [ST\\_INVALID\\_ENDIANNESS\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#st_invalid_endianness_value), [ST\\_INVALID\\_GEOHASH\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#st_invalid_geohash_value), [ST\\_INVALID\\_PRECISION\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#st_invalid_precision_value), [ST\\_INVALID\\_SRID\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#st_invalid_srid_value), [TABLE\\_VALUED\\_FUNCTION\\_REQUIRED\\_METADATA\\_INCOMPATIBLE\\_WITH\\_CALL](https://docs.databricks.com/error-messages/error-classes.html#table_valued_function_required_metadata_incompatible_with_call), [TABLE\\_VALUED\\_FUNCTION\\_REQUIRED\\_METADATA\\_INVALID](https://docs.databricks.com/error-messages/error-classes.html#table_valued_function_required_metadata_invalid), [VARIANT\\_SIZE\\_LIMIT](https://docs.databricks.com/error-messages/error-classes.html#variant_size_limit), [WKB\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/wkb-parse-error-error-class.html), [WKT\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/wkt-parse-error-error-class.html), [XML\\_WILDCARD\\_RESCUED\\_DATA\\_CONFLICT\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#xml_wildcard_rescued_data_conflict_error) |\n| `22032` | invalid JSON text |\n| | [AI\\_FUNCTION\\_INVALID\\_MAX\\_WORDS](https://docs.databricks.com/error-messages/error-classes.html#ai_function_invalid_max_words), [AI\\_INVALID\\_ARGUMENT\\_VALUE\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#ai_invalid_argument_value_error), [INVALID\\_JSON\\_ROOT\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#invalid_json_root_field), [INVALID\\_JSON\\_SCHEMA\\_MAP\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#invalid_json_schema_map_type), [REMOTE\\_FUNCTION\\_HTTP\\_RESULT\\_PARSE\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#remote_function_http_result_parse_error) |\n| `2203G` | sql json item cannot be cast to target type |\n| | [AI\\_FUNCTION\\_HTTP\\_PARSE\\_CAST\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#ai_function_http_parse_cast_error), [AI\\_FUNCTION\\_HTTP\\_PARSE\\_COLUMNS\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#ai_function_http_parse_columns_error), [CANNOT\\_PARSE\\_JSON\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#cannot_parse_json_field) |\n| `22525` | Partitioning key value is not valid. |\n| | [DELTA\\_PARTITION\\_COLUMN\\_CAST\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#delta_partition_column_cast_failed) |\n| `22531` | The argument of a built-in or system provided routine resulted in an error. |\n| | [INVALID\\_SECRET\\_LOOKUP](https://docs.databricks.com/error-messages/invalid-secret-lookup-error-class.html) |\n| `22546` | The value for a routine argument is not valid. |\n| | [CANNOT\\_DECODE\\_URL](https://docs.databricks.com/error-messages/error-classes.html#cannot_decode_url), [HLL\\_INVALID\\_INPUT\\_SKETCH\\_BUFFER](https://docs.databricks.com/error-messages/error-classes.html#hll_invalid_input_sketch_buffer), [HLL\\_INVALID\\_LG\\_K](https://docs.databricks.com/error-messages/error-classes.html#hll_invalid_lg_k) |\n| `22KD1` | Invalid URI or PATH |\n| | [DELTA\\_CANNOT\\_RECONSTRUCT\\_PATH\\_FROM\\_URI](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_reconstruct_path_from_uri), [DELTA\\_CANNOT\\_RENAME\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_rename_path), [DELTA\\_INVALID\\_CLONE\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_clone_path), [DELTA\\_INVALID\\_PARTITION\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_partition_path) |\n| `22KD2` | Identity claim is unset |\n| | [OAUTH\\_CUSTOM\\_IDENTITY\\_CLAIM\\_NOT\\_PROVIDED](https://docs.databricks.com/error-messages/error-classes.html#oauth_custom_identity_claim_not_provided) |\n| `22KD3` | Cannot evolve source type to target type. |\n| | [AVRO\\_INCOMPATIBLE\\_READ\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#avro_incompatible_read_type), [FROM\\_JSON\\_SCHEMA\\_EVOLUTION\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#from_json_schema_evolution_failed) |\n| `22P02` | invalid text representation |\n| | [INVALID\\_URL](https://docs.databricks.com/error-messages/error-classes.html#invalid_url) |\n| `22P03` | invalid binary representation |\n| | [INVALID\\_BYTE\\_STRING](https://docs.databricks.com/error-messages/error-classes.html#invalid_byte_string) |\n\n", "chunk_id": "8340a587c60271133e787407b766bc7e", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `23`: integrity constraint violation\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `23001` | restrict violation |\n| | [DELTA\\_VIOLATE\\_CONSTRAINT\\_WITH\\_VALUES](https://docs.databricks.com/error-messages/error-classes.html#delta_violate_constraint_with_values) |\n| `23502` | An insert or update value is null, but the column cannot contain null values. |\n| | [DELTA\\_MISSING\\_NOT\\_NULL\\_COLUMN\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_not_null_column_value), [DELTA\\_NOT\\_NULL\\_CONSTRAINT\\_VIOLATED](https://docs.databricks.com/error-messages/error-classes.html#delta_not_null_constraint_violated) |\n| `23505` | A violation of the constraint imposed by a unique index or a unique constraint occurred. |\n| | [DUPLICATED\\_MAP\\_KEY](https://docs.databricks.com/error-messages/error-classes.html#duplicated_map_key), [DUPLICATE\\_KEY](https://docs.databricks.com/error-messages/error-classes.html#duplicate_key) |\n| `23512` | The check constraint cannot be added, because the table contains rows that do not satisfy the constraint definition. |\n| | [DELTA\\_NEW\\_CHECK\\_CONSTRAINT\\_VIOLATION](https://docs.databricks.com/error-messages/error-classes.html#delta_new_check_constraint_violation), [DELTA\\_NEW\\_NOT\\_NULL\\_VIOLATION](https://docs.databricks.com/error-messages/error-classes.html#delta_new_not_null_violation) |\n| `23K01` | MERGE cardinality violation |\n| | [MERGE\\_CARDINALITY\\_VIOLATION](https://docs.databricks.com/error-messages/error-classes.html#merge_cardinality_violation) |\n\n", "chunk_id": "af73e0a280d23d115becb71bb744e55e", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `25`: invalid transaction state\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `25000` | invalid transaction state |\n| | [COPY\\_INTO\\_DUPLICATED\\_FILES\\_COPY\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#copy_into_duplicated_files_copy_not_allowed), [COPY\\_INTO\\_NON\\_BLIND\\_APPEND\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#copy_into_non_blind_append_not_allowed), [COPY\\_INTO\\_ROCKSDB\\_MAX\\_RETRY\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#copy_into_rocksdb_max_retry_exceeded), [DATA\\_LINEAGE\\_SECURE\\_VIEW\\_LEAF\\_NODE\\_HAS\\_NO\\_RELATION](https://docs.databricks.com/error-messages/error-classes.html#data_lineage_secure_view_leaf_node_has_no_relation), [DELTA\\_INVALID\\_ISOLATION\\_LEVEL](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_isolation_level), [DELTA\\_MERGE\\_MATERIALIZE\\_SOURCE\\_FAILED\\_REPEATEDLY](https://docs.databricks.com/error-messages/error-classes.html#delta_merge_materialize_source_failed_repeatedly) |\n\n", "chunk_id": "b544ec020ff061fc884d2d4639a316ad", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `2B`: dependent privilege descriptors still exist\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `2BP01` | dependent objects still exist |\n| | [SCHEMA\\_NOT\\_EMPTY](https://docs.databricks.com/error-messages/error-classes.html#schema_not_empty) |\n\n#### SQLSTATE error codes\n##### Class `2D`: invalid transaction termination\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `2D521` | SQL COMMIT or ROLLBACK are invalid in the current operating environment. |\n| | [DELTA\\_DELETION\\_VECTOR\\_MISSING\\_NUM\\_RECORDS](https://docs.databricks.com/error-messages/error-classes.html#delta_deletion_vector_missing_num_records), [DELTA\\_DUPLICATE\\_ACTIONS\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_duplicate_actions_found) |\n| `2DKD0` | Post commit hook failed. |\n| | [DELTA\\_POST\\_COMMIT\\_HOOK\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#delta_post_commit_hook_failed) |\n\n", "chunk_id": "33987558c5523e4250a8703695197686", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `38`: external routine exception\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `38000` | external routine exception |\n| | [FAILED\\_FUNCTION\\_CALL](https://docs.databricks.com/error-messages/error-classes.html#failed_function_call), [INVALID\\_UDF\\_IMPLEMENTATION](https://docs.databricks.com/error-messages/error-classes.html#invalid_udf_implementation), [NO\\_UDF\\_INTERFACE](https://docs.databricks.com/error-messages/error-classes.html#no_udf_interface), [PYTHON\\_DATA\\_SOURCE\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#python_data_source_error), [TABLE\\_VALUED\\_FUNCTION\\_FAILED\\_TO\\_ANALYZE\\_IN\\_PYTHON](https://docs.databricks.com/error-messages/error-classes.html#table_valued_function_failed_to_analyze_in_python) |\n\n#### SQLSTATE error codes\n##### Class `39`: external routine invocation exception\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `39000` | external routine invocation exception |\n| | [FAILED\\_EXECUTE\\_UDF](https://docs.databricks.com/error-messages/error-classes.html#failed_execute_udf), [FOREACH\\_BATCH\\_USER\\_FUNCTION\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#foreach_batch_user_function_error), [UDF\\_USER\\_CODE\\_ERROR](https://docs.databricks.com/error-messages/udf-user-code-error-error-class.html) |\n\n", "chunk_id": "bc1cf5717125a3e965381533547eb745", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `3D`: invalid catalog name\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `3D000` | invalid catalog name |\n| | [UC\\_CATALOG\\_NAME\\_NOT\\_PROVIDED](https://docs.databricks.com/error-messages/error-classes.html#uc_catalog_name_not_provided) |\n\n#### SQLSTATE error codes\n##### Class `40`: transaction rollback\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `40000` | transaction rollback |\n| | [CONCURRENT\\_STREAM\\_LOG\\_UPDATE](https://docs.databricks.com/error-messages/error-classes.html#concurrent_stream_log_update), [DELTA\\_MAX\\_COMMIT\\_RETRIES\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#delta_max_commit_retries_exceeded) |\n\n", "chunk_id": "30dc28f14c5a4534c5f2dece93c08a0a", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `42`: syntax error or access rule violation\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `42000` | syntax error or access rule violation |\n| | [AMBIGUOUS\\_REFERENCE\\_TO\\_FIELDS](https://docs.databricks.com/error-messages/error-classes.html#ambiguous_reference_to_fields), [CF\\_AMBIGUOUS\\_AUTH\\_OPTIONS\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_ambiguous_auth_options_error), [CF\\_AMBIGUOUS\\_INCREMENTAL\\_LISTING\\_MODE\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_ambiguous_incremental_listing_mode_error), [CF\\_AZURE\\_STORAGE\\_SUFFIXES\\_REQUIRED](https://docs.databricks.com/error-messages/error-classes.html#cf_azure_storage_suffixes_required), [CF\\_CLEAN\\_SOURCE\\_ALLOW\\_OVERWRITES\\_BOTH\\_ON](https://docs.databricks.com/error-messages/error-classes.html#cf_clean_source_allow_overwrites_both_on), [CF\\_EMPTY\\_DIR\\_FOR\\_SCHEMA\\_INFERENCE](https://docs.databricks.com/error-messages/error-classes.html#cf_empty_dir_for_schema_inference), [CF\\_FAILED\\_TO\\_FIND\\_PROVIDER](https://docs.databricks.com/error-messages/error-classes.html#cf_failed_to_find_provider), [CF\\_FILE\\_FORMAT\\_REQUIRED](https://docs.databricks.com/error-messages/error-classes.html#cf_file_format_required), [CF\\_GCP\\_AUTHENTICATION](https://docs.databricks.com/error-messages/error-classes.html#cf_gcp_authentication), [CF\\_INCORRECT\\_SQL\\_PARAMS](https://docs.databricks.com/error-messages/error-classes.html#cf_incorrect_sql_params), [CF\\_INTERNAL\\_ERROR](https://docs.databricks.com/error-messages/cf-internal-error-error-class.html), [CF\\_INVALID\\_ARN](https://docs.databricks.com/error-messages/error-classes.html#cf_invalid_arn), [CF\\_INVALID\\_CHECKPOINT](https://docs.databricks.com/error-messages/error-classes.html#cf_invalid_checkpoint), [CF\\_INVALID\\_CLEAN\\_SOURCE\\_MODE](https://docs.databricks.com/error-messages/error-classes.html#cf_invalid_clean_source_mode), [CF\\_INVALID\\_GCP\\_RESOURCE\\_TAG\\_KEY](https://docs.databricks.com/error-messages/error-classes.html#cf_invalid_gcp_resource_tag_key), [CF\\_INVALID\\_GCP\\_RESOURCE\\_TAG\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#cf_invalid_gcp_resource_tag_value), [CF\\_INVALID\\_MANAGED\\_FILE\\_EVENTS\\_OPTION\\_KEYS](https://docs.databricks.com/error-messages/error-classes.html#cf_invalid_managed_file_events_option_keys), [CF\\_INVALID\\_SCHEMA\\_EVOLUTION\\_MODE](https://docs.databricks.com/error-messages/error-classes.html#cf_invalid_schema_evolution_mode), [CF\\_INVALID\\_SCHEMA\\_HINTS\\_OPTION](https://docs.databricks.com/error-messages/error-classes.html#cf_invalid_schema_hints_option), [CF\\_INVALID\\_SCHEMA\\_HINT\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#cf_invalid_schema_hint_column), [CF\\_MANAGED\\_FILE\\_EVENTS\\_ENDPOINT\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#cf_managed_file_events_endpoint_not_found), [CF\\_MANAGED\\_FILE\\_EVENTS\\_ENDPOINT\\_PERMISSION\\_DENIED](https://docs.databricks.com/error-messages/error-classes.html#cf_managed_file_events_endpoint_permission_denied), [CF\\_MAX\\_MUST\\_BE\\_POSITIVE](https://docs.databricks.com/error-messages/error-classes.html#cf_max_must_be_positive), [CF\\_MISSING\\_METADATA\\_FILE\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_missing_metadata_file_error), [CF\\_MISSING\\_PARTITION\\_COLUMN\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_missing_partition_column_error), [CF\\_MISSING\\_SCHEMA\\_IN\\_PATHLESS\\_MODE](https://docs.databricks.com/error-messages/error-classes.html#cf_missing_schema_in_pathless_mode), [CF\\_PATH\\_DOES\\_NOT\\_EXIST\\_FOR\\_READ\\_FILES](https://docs.databricks.com/error-messages/error-classes.html#cf_path_does_not_exist_for_read_files), [CF\\_REGION\\_NOT\\_FOUND\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_region_not_found_error), [CF\\_RESOURCE\\_SUFFIX\\_EMPTY](https://docs.databricks.com/error-messages/error-classes.html#cf_resource_suffix_empty), [CF\\_RESOURCE\\_SUFFIX\\_INVALID\\_CHAR\\_AWS](https://docs.databricks.com/error-messages/error-classes.html#cf_resource_suffix_invalid_char_aws), [CF\\_RESOURCE\\_SUFFIX\\_INVALID\\_CHAR\\_AZURE](https://docs.databricks.com/error-messages/error-classes.html#cf_resource_suffix_invalid_char_azure), [CF\\_RESOURCE\\_SUFFIX\\_INVALID\\_CHAR\\_GCP](https://docs.databricks.com/error-messages/error-classes.html#cf_resource_suffix_invalid_char_gcp), [CF\\_RESOURCE\\_SUFFIX\\_LIMIT](https://docs.databricks.com/error-messages/error-classes.html#cf_resource_suffix_limit), [CF\\_RESOURCE\\_SUFFIX\\_LIMIT\\_GCP](https://docs.databricks.com/error-messages/error-classes.html#cf_resource_suffix_limit_gcp), [CF\\_RETENTION\\_GREATER\\_THAN\\_MAX\\_FILE\\_AGE](https://docs.databricks.com/error-messages/error-classes.html#cf_retention_greater_than_max_file_age), [CF\\_SOURCE\\_DIRECTORY\\_PATH\\_REQUIRED](https://docs.databricks.com/error-messages/error-classes.html#cf_source_directory_path_required), [CF\\_STATE\\_INCORRECT\\_SQL\\_PARAMS](https://docs.databricks.com/error-messages/error-classes.html#cf_state_incorrect_sql_params), [CF\\_STATE\\_INVALID\\_CHECKPOINT\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#cf_state_invalid_checkpoint_path), [CF\\_STATE\\_INVALID\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#cf_state_invalid_version), [CF\\_UNABLE\\_TO\\_DERIVE\\_STREAM\\_CHECKPOINT\\_LOCATION](https://docs.databricks.com/error-messages/error-classes.html#cf_unable_to_derive_stream_checkpoint_location), [CF\\_UNABLE\\_TO\\_DETECT\\_FILE\\_FORMAT](https://docs.databricks.com/error-messages/error-classes.html#cf_unable_to_detect_file_format), [CF\\_UNABLE\\_TO\\_EXTRACT\\_BUCKET\\_INFO](https://docs.databricks.com/error-messages/error-classes.html#cf_unable_to_extract_bucket_info), [CF\\_UNABLE\\_TO\\_EXTRACT\\_KEY\\_INFO](https://docs.databricks.com/error-messages/error-classes.html#cf_unable_to_extract_key_info), [CF\\_UNABLE\\_TO\\_EXTRACT\\_STORAGE\\_ACCOUNT\\_INFO](https://docs.databricks.com/error-messages/error-classes.html#cf_unable_to_extract_storage_account_info), [CF\\_UNKNOWN\\_OPTION\\_KEYS\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#cf_unknown_option_keys_error), [CF\\_USE\\_DELTA\\_FORMAT](https://docs.databricks.com/error-messages/error-classes.html#cf_use_delta_format), [CONNECTION\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#connection_already_exists), [CONNECTION\\_NAME\\_CANNOT\\_BE\\_EMPTY](https://docs.databricks.com/error-messages/error-classes.html#connection_name_cannot_be_empty), [CONNECTION\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#connection_not_found), [CONNECTION\\_OPTION\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#connection_option_not_supported), [CONNECTION\\_TYPE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#connection_type_not_supported), [DELTA\\_ADDING\\_COLUMN\\_WITH\\_INTERNAL\\_NAME\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#delta_adding_column_with_internal_name_failed), [DELTA\\_ADDING\\_DELETION\\_VECTORS\\_WITH\\_TIGHT\\_BOUNDS\\_DISALLOWED](https://docs.databricks.com/error-messages/error-classes.html#delta_adding_deletion_vectors_with_tight_bounds_disallowed), [DELTA\\_ALTER\\_TABLE\\_CLUSTER\\_BY\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#delta_alter_table_cluster_by_not_allowed), [DELTA\\_ALTER\\_TABLE\\_CLUSTER\\_BY\\_ON\\_PARTITIONED\\_TABLE\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#delta_alter_table_cluster_by_on_partitioned_table_not_allowed), [DELTA\\_ALTER\\_TABLE\\_RENAME\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#delta_alter_table_rename_not_allowed), [DELTA\\_ALTER\\_TABLE\\_SET\\_CLUSTERING\\_TABLE\\_FEATURE\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#delta_alter_table_set_clustering_table_feature_not_allowed), [DELTA\\_CANNOT\\_RELATIVIZE\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_relativize_path), [DELTA\\_CLONE\\_INCOMPLETE\\_FILE\\_COPY](https://docs.databricks.com/error-messages/error-classes.html#delta_clone_incomplete_file_copy), [DELTA\\_CLUSTERING\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_not_supported), [DELTA\\_CLUSTERING\\_REPLACE\\_TABLE\\_WITH\\_PARTITIONED\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_replace_table_with_partitioned_table), [DELTA\\_CLUSTERING\\_WITH\\_DYNAMIC\\_PARTITION\\_OVERWRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_with_dynamic_partition_overwrite), [DELTA\\_CREATE\\_TABLE\\_SET\\_CLUSTERING\\_TABLE\\_FEATURE\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#delta_create_table_set_clustering_table_feature_not_allowed), [DELTA\\_INCONSISTENT\\_BUCKET\\_SPEC](https://docs.databricks.com/error-messages/error-classes.html#delta_inconsistent_bucket_spec), [DELTA\\_INCORRECT\\_GET\\_CONF](https://docs.databricks.com/error-messages/error-classes.html#delta_incorrect_get_conf), [DELTA\\_INVALID\\_MANAGED\\_TABLE\\_SYNTAX\\_NO\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_managed_table_syntax_no_schema), [DELTA\\_MAX\\_ARRAY\\_SIZE\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#delta_max_array_size_exceeded), [DELTA\\_MAX\\_LIST\\_FILE\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#delta_max_list_file_exceeded), [DELTA\\_MISSING\\_TRANSACTION\\_LOG](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_transaction_log), [DELTA\\_NAME\\_CONFLICT\\_IN\\_BUCKETED\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_name_conflict_in_bucketed_table), [DELTA\\_NOT\\_A\\_DATABRICKS\\_DELTA\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_not_a_databricks_delta_table), [DELTA\\_OVERWRITE\\_MUST\\_BE\\_TRUE](https://docs.databricks.com/error-messages/error-classes.html#delta_overwrite_must_be_true), [DELTA\\_STATS\\_COLLECTION\\_COLUMN\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_stats_collection_column_not_found), [FOREIGN\\_OBJECT\\_NAME\\_CANNOT\\_BE\\_EMPTY](https://docs.databricks.com/error-messages/error-classes.html#foreign_object_name_cannot_be_empty), [INVALID\\_COLUMN\\_OR\\_FIELD\\_DATA\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#invalid_column_or_field_data_type), [INVALID\\_EXTRACT\\_BASE\\_FIELD\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#invalid_extract_base_field_type), [INVALID\\_EXTRACT\\_FIELD\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#invalid_extract_field_type), [INVALID\\_FIELD\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#invalid_field_name), [INVALID\\_INLINE\\_TABLE](https://docs.databricks.com/error-messages/invalid-inline-table-error-class.html), [INVALID\\_SAVE\\_MODE](https://docs.databricks.com/error-messages/error-classes.html#invalid_save_mode), [INVALID\\_SET\\_SYNTAX](https://docs.databricks.com/error-messages/error-classes.html#invalid_set_syntax), [INVALID\\_SQL\\_SYNTAX](https://docs.databricks.com/error-messages/invalid-sql-syntax-error-class.html), [INVALID\\_USAGE\\_OF\\_STAR\\_OR\\_REGEX](https://docs.databricks.com/error-messages/error-classes.html#invalid_usage_of_star_or_regex), [INVALID\\_WRITE\\_DISTRIBUTION](https://docs.databricks.com/error-messages/invalid-write-distribution-error-class.html), [MISSING\\_CONNECTION\\_OPTION](https://docs.databricks.com/error-messages/error-classes.html#missing_connection_option), [MISSING\\_NAME\\_FOR\\_CHECK\\_CONSTRAINT](https://docs.databricks.com/error-messages/error-classes.html#missing_name_for_check_constraint), [MULTIPLE\\_LOAD\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#multiple_load_path), [NAMESPACE\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#namespace_already_exists), [NAMESPACE\\_NOT\\_EMPTY](https://docs.databricks.com/error-messages/error-classes.html#namespace_not_empty), [NAMESPACE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#namespace_not_found), [NON\\_PARTITION\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#non_partition_column), [NOT\\_NULL\\_CONSTRAINT\\_VIOLATION](https://docs.databricks.com/error-messages/not-null-constraint-violation-error-class.html), [NO\\_HANDLER\\_FOR\\_UDAF](https://docs.databricks.com/error-messages/error-classes.html#no_handler_for_udaf), [NULLABLE\\_COLUMN\\_OR\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#nullable_column_or_field), [NULLABLE\\_ROW\\_ID\\_ATTRIBUTES](https://docs.databricks.com/error-messages/error-classes.html#nullable_row_id_attributes), [PS\\_INVALID\\_EMPTY\\_OPTION](https://docs.databricks.com/error-messages/error-classes.html#ps_invalid_empty_option), [PS\\_INVALID\\_OPTION](https://docs.databricks.com/error-messages/error-classes.html#ps_invalid_option), [PS\\_INVALID\\_OPTION\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#ps_invalid_option_type), [PS\\_INVALID\\_READ\\_LIMIT](https://docs.databricks.com/error-messages/error-classes.html#ps_invalid_read_limit), [PS\\_MISSING\\_AUTH\\_INFO](https://docs.databricks.com/error-messages/error-classes.html#ps_missing_auth_info), [PS\\_MISSING\\_REQUIRED\\_OPTION](https://docs.databricks.com/error-messages/error-classes.html#ps_missing_required_option), [PS\\_PROVIDE\\_CREDENTIALS\\_WITH\\_OPTION](https://docs.databricks.com/error-messages/error-classes.html#ps_provide_credentials_with_option), [PS\\_UNABLE\\_TO\\_CREATE\\_SUBSCRIPTION](https://docs.databricks.com/error-messages/error-classes.html#ps_unable_to_create_subscription), [ROUTINE\\_PARAMETER\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#routine_parameter_not_found), [STREAMING\\_TABLE\\_QUERY\\_INVALID](https://docs.databricks.com/error-messages/error-classes.html#streaming_table_query_invalid), [UNSUPPORTED\\_BATCH\\_TABLE\\_VALUED\\_FUNCTION](https://docs.databricks.com/error-messages/error-classes.html#unsupported_batch_table_valued_function), [UNSUPPORTED\\_CONSTRAINT\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_constraint_type), [UNSUPPORTED\\_STREAMING\\_TABLE\\_VALUED\\_FUNCTION](https://docs.databricks.com/error-messages/error-classes.html#unsupported_streaming_table_valued_function), [VOLUME\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#volume_already_exists) |\n| `42001` | Invalid encoder error |\n| | [INVALID\\_EXPRESSION\\_ENCODER](https://docs.databricks.com/error-messages/error-classes.html#invalid_expression_encoder) |\n| `42501` | The authorization ID does not have the privilege to perform the specified operation on the identified object. |\n| | [CANNOT\\_READ\\_SENSITIVE\\_KEY\\_FROM\\_SECURE\\_PROVIDER](https://docs.databricks.com/error-messages/error-classes.html#cannot_read_sensitive_key_from_secure_provider), [CF\\_CLEAN\\_SOURCE\\_UNAUTHORIZED\\_WRITE\\_PERMISSION](https://docs.databricks.com/error-messages/error-classes.html#cf_clean_source_unauthorized_write_permission), [INSUFFICIENT\\_PERMISSIONS](https://docs.databricks.com/error-messages/error-classes.html#insufficient_permissions), [INSUFFICIENT\\_PERMISSIONS\\_EXT\\_LOC](https://docs.databricks.com/error-messages/error-classes.html#insufficient_permissions_ext_loc), [INSUFFICIENT\\_PERMISSIONS\\_NO\\_OWNER](https://docs.databricks.com/error-messages/error-classes.html#insufficient_permissions_no_owner), [INSUFFICIENT\\_PERMISSIONS\\_OWNERSHIP\\_SECURABLE](https://docs.databricks.com/error-messages/error-classes.html#insufficient_permissions_ownership_securable), [INSUFFICIENT\\_PERMISSIONS\\_SECURABLE](https://docs.databricks.com/error-messages/error-classes.html#insufficient_permissions_securable), [INSUFFICIENT\\_PERMISSIONS\\_SECURABLE\\_PARENT\\_OWNER](https://docs.databricks.com/error-messages/error-classes.html#insufficient_permissions_securable_parent_owner), [INSUFFICIENT\\_PERMISSIONS\\_STORAGE\\_CRED](https://docs.databricks.com/error-messages/error-classes.html#insufficient_permissions_storage_cred), [INSUFFICIENT\\_PERMISSIONS\\_UNDERLYING\\_SECURABLES](https://docs.databricks.com/error-messages/error-classes.html#insufficient_permissions_underlying_securables), [INSUFFICIENT\\_PERMISSIONS\\_UNDERLYING\\_SECURABLES\\_VERBOSE](https://docs.databricks.com/error-messages/error-classes.html#insufficient_permissions_underlying_securables_verbose), [INVALID\\_S3\\_COPY\\_CREDENTIALS](https://docs.databricks.com/error-messages/error-classes.html#invalid_s3_copy_credentials), [UNAUTHORIZED\\_ACCESS](https://docs.databricks.com/error-messages/error-classes.html#unauthorized_access) |\n| `42601` | A character, token, or clause is invalid or missing. |\n| | [COLUMN\\_ALIASES\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#column_aliases_not_allowed), [COMMA\\_PRECEDING\\_CONSTRAINT\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#comma_preceding_constraint_error), [COPY\\_INTO\\_CREDENTIALS\\_REQUIRED](https://docs.databricks.com/error-messages/error-classes.html#copy_into_credentials_required), [COPY\\_INTO\\_ENCRYPTION\\_REQUIRED](https://docs.databricks.com/error-messages/error-classes.html#copy_into_encryption_required), [COPY\\_INTO\\_ENCRYPTION\\_REQUIRED\\_WITH\\_EXPECTED](https://docs.databricks.com/error-messages/error-classes.html#copy_into_encryption_required_with_expected), [COPY\\_INTO\\_SYNTAX\\_ERROR](https://docs.databricks.com/error-messages/copy-into-syntax-error-error-class.html), [CREDENTIAL\\_MISSING](https://docs.databricks.com/error-messages/error-classes.html#credential_missing), [DATA\\_SOURCE\\_OPTION\\_IS\\_REQUIRED](https://docs.databricks.com/error-messages/error-classes.html#data_source_option_is_required), [DELTA\\_CANNOT\\_CHANGE\\_LOCATION](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_change_location), [DELTA\\_CREATE\\_EXTERNAL\\_TABLE\\_WITHOUT\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#delta_create_external_table_without_schema), [DELTA\\_CREATE\\_TABLE\\_WITH\\_NON\\_EMPTY\\_LOCATION](https://docs.databricks.com/error-messages/error-classes.html#delta_create_table_with_non_empty_location), [DELTA\\_DUPLICATE\\_DOMAIN\\_METADATA\\_INTERNAL\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#delta_duplicate_domain_metadata_internal_error), [DELTA\\_FAILED\\_RECOGNIZE\\_PREDICATE](https://docs.databricks.com/error-messages/error-classes.html#delta_failed_recognize_predicate), [DELTA\\_ILLEGAL\\_USAGE](https://docs.databricks.com/error-messages/error-classes.html#delta_illegal_usage), [DELTA\\_MERGE\\_MISSING\\_WHEN](https://docs.databricks.com/error-messages/error-classes.html#delta_merge_missing_when), [DELTA\\_MERGE\\_RESOLVED\\_ATTRIBUTE\\_MISSING\\_FROM\\_INPUT](https://docs.databricks.com/error-messages/error-classes.html#delta_merge_resolved_attribute_missing_from_input), [DELTA\\_MERGE\\_UNRESOLVED\\_EXPRESSION](https://docs.databricks.com/error-messages/error-classes.html#delta_merge_unresolved_expression), [DELTA\\_NON\\_LAST\\_MATCHED\\_CLAUSE\\_OMIT\\_CONDITION](https://docs.databricks.com/error-messages/error-classes.html#delta_non_last_matched_clause_omit_condition), [DELTA\\_NON\\_LAST\\_NOT\\_MATCHED\\_BY\\_SOURCE\\_CLAUSE\\_OMIT\\_CONDITION](https://docs.databricks.com/error-messages/error-classes.html#delta_non_last_not_matched_by_source_clause_omit_condition), [DELTA\\_NON\\_LAST\\_NOT\\_MATCHED\\_CLAUSE\\_OMIT\\_CONDITION](https://docs.databricks.com/error-messages/error-classes.html#delta_non_last_not_matched_clause_omit_condition), [DELTA\\_NON\\_PARSABLE\\_TAG](https://docs.databricks.com/error-messages/error-classes.html#delta_non_parsable_tag), [DELTA\\_NO\\_START\\_FOR\\_CDC\\_READ](https://docs.databricks.com/error-messages/error-classes.html#delta_no_start_for_cdc_read), [DELTA\\_ONEOF\\_IN\\_TIMETRAVEL](https://docs.databricks.com/error-messages/error-classes.html#delta_oneof_in_timetravel), [DELTA\\_OPERATION\\_MISSING\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#delta_operation_missing_path), [DELTA\\_UNEXPECTED\\_ACTION\\_EXPRESSION](https://docs.databricks.com/error-messages/error-classes.html#delta_unexpected_action_expression), [DELTA\\_UNKNOWN\\_PRIVILEGE](https://docs.databricks.com/error-messages/error-classes.html#delta_unknown_privilege), [DELTA\\_UNKNOWN\\_READ\\_LIMIT](https://docs.databricks.com/error-messages/error-classes.html#delta_unknown_read_limit), [DELTA\\_UNRECOGNIZED\\_COLUMN\\_CHANGE](https://docs.databricks.com/error-messages/error-classes.html#delta_unrecognized_column_change), [EVENT\\_LOG\\_REQUIRES\\_SHARED\\_COMPUTE](https://docs.databricks.com/error-messages/error-classes.html#event_log_requires_shared_compute), [FROM\\_JSON\\_CONFLICTING\\_SCHEMA\\_UPDATES](https://docs.databricks.com/error-messages/error-classes.html#from_json_conflicting_schema_updates), [FROM\\_JSON\\_CORRUPT\\_RECORD\\_COLUMN\\_IN\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#from_json_corrupt_record_column_in_schema), [FROM\\_JSON\\_CORRUPT\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#from_json_corrupt_schema), [FROM\\_JSON\\_INFERENCE\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#from_json_inference_failed), [FROM\\_JSON\\_INVALID\\_CONFIGURATION](https://docs.databricks.com/error-messages/from-json-invalid-configuration-error-class.html), [IDENTIFIER\\_TOO\\_MANY\\_NAME\\_PARTS](https://docs.databricks.com/error-messages/error-classes.html#identifier_too_many_name_parts), [INVALID\\_EXTRACT\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#invalid_extract_field), [INVALID\\_FORMAT](https://docs.databricks.com/error-messages/invalid-format-error-class.html), [INVALID\\_PARTITION\\_OPERATION](https://docs.databricks.com/error-messages/invalid-partition-operation-error-class.html), [INVALID\\_SHARED\\_ALIAS\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#invalid_shared_alias_name), [INVALID\\_WINDOW\\_SPEC\\_FOR\\_AGGREGATION\\_FUNC](https://docs.databricks.com/error-messages/error-classes.html#invalid_window_spec_for_aggregation_func), [LOCAL\\_MUST\\_WITH\\_SCHEMA\\_FILE](https://docs.databricks.com/error-messages/error-classes.html#local_must_with_schema_file), [MV\\_ST\\_ALTER\\_QUERY\\_INCORRECT\\_BACKING\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#mv_st_alter_query_incorrect_backing_type), [NOT\\_ALLOWED\\_IN\\_FROM](https://docs.databricks.com/error-messages/not-allowed-in-from-error-class.html), [NOT\\_A\\_CONSTANT\\_STRING](https://docs.databricks.com/error-messages/not-a-constant-string-error-class.html), [PARSE\\_SYNTAX\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#parse_syntax_error), [REF\\_DEFAULT\\_VALUE\\_IS\\_NOT\\_ALLOWED\\_IN\\_PARTITION](https://docs.databricks.com/error-messages/error-classes.html#ref_default_value_is_not_allowed_in_partition), [SORT\\_BY\\_WITHOUT\\_BUCKETING](https://docs.databricks.com/error-messages/error-classes.html#sort_by_without_bucketing), [SPECIFY\\_BUCKETING\\_IS\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#specify_bucketing_is_not_allowed), [SPECIFY\\_PARTITION\\_IS\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#specify_partition_is_not_allowed), [STDS\\_REQUIRED\\_OPTION\\_UNSPECIFIED](https://docs.databricks.com/error-messages/error-classes.html#stds_required_option_unspecified), [STREAMING\\_TABLE\\_OPERATION\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/streaming-table-operation-not-allowed-error-class.html), [UC\\_EXTERNAL\\_VOLUME\\_MISSING\\_LOCATION](https://docs.databricks.com/error-messages/error-classes.html#uc_external_volume_missing_location), [UC\\_LOCATION\\_FOR\\_MANAGED\\_VOLUME\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#uc_location_for_managed_volume_not_supported), [UNCLOSED\\_BRACKETED\\_COMMENT](https://docs.databricks.com/error-messages/error-classes.html#unclosed_bracketed_comment), [USER\\_DEFINED\\_FUNCTIONS](https://docs.databricks.com/error-messages/user-defined-functions-error-class.html), [WINDOW\\_FUNCTION\\_WITHOUT\\_OVER\\_CLAUSE](https://docs.databricks.com/error-messages/error-classes.html#window_function_without_over_clause), [WITH\\_CREDENTIAL](https://docs.databricks.com/error-messages/error-classes.html#with_credential), [WRITE\\_STREAM\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#write_stream_not_allowed) |\n| `42602` | A character that is invalid in a name has been detected. |\n| | [DATA\\_SOURCE\\_OPTION\\_CONTAINS\\_INVALID\\_CHARACTERS](https://docs.databricks.com/error-messages/error-classes.html#data_source_option_contains_invalid_characters), [INVALID\\_IDENTIFIER](https://docs.databricks.com/error-messages/error-classes.html#invalid_identifier), [INVALID\\_PROPERTY\\_KEY](https://docs.databricks.com/error-messages/error-classes.html#invalid_property_key), [INVALID\\_PROPERTY\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#invalid_property_value), [INVALID\\_SCHEMA\\_OR\\_RELATION\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#invalid_schema_or_relation_name) |\n| `42604` | An invalid numeric or string constant has been detected. |\n| | [AS\\_OF\\_JOIN](https://docs.databricks.com/error-messages/as-of-join-error-class.html), [DELTA\\_TIME\\_TRAVEL\\_INVALID\\_BEGIN\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#delta_time_travel_invalid_begin_value), [EMPTY\\_JSON\\_FIELD\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#empty_json_field_value), [INVALID\\_ESC](https://docs.databricks.com/error-messages/error-classes.html#invalid_esc), [INVALID\\_ESCAPE\\_CHAR](https://docs.databricks.com/error-messages/error-classes.html#invalid_escape_char), [INVALID\\_PIPELINE\\_ID](https://docs.databricks.com/error-messages/error-classes.html#invalid_pipeline_id), [INVALID\\_STAGING\\_PATH\\_IN\\_STAGING\\_ACCESS\\_QUERY](https://docs.databricks.com/error-messages/error-classes.html#invalid_staging_path_in_staging_access_query), [INVALID\\_TYPED\\_LITERAL](https://docs.databricks.com/error-messages/error-classes.html#invalid_typed_literal), [INVALID\\_UUID](https://docs.databricks.com/error-messages/error-classes.html#invalid_uuid) |\n| `42605` | The number of arguments specified for a scalar function is invalid. |\n| | [INCORRECT\\_NUMBER\\_OF\\_ARGUMENTS](https://docs.databricks.com/error-messages/error-classes.html#incorrect_number_of_arguments), [WRONG\\_NUM\\_ARGS](https://docs.databricks.com/error-messages/wrong-num-args-error-class.html) |\n| `42607` | An operand of an aggregate function or CONCAT operator is invalid. |\n| | [NESTED\\_AGGREGATE\\_FUNCTION](https://docs.databricks.com/error-messages/error-classes.html#nested_aggregate_function) |\n| `42608` | The use of NULL or DEFAULT in VALUES or an assignment statement is invalid. |\n| | [DEFAULT\\_PLACEMENT\\_INVALID](https://docs.databricks.com/error-messages/error-classes.html#default_placement_invalid), [NO\\_DEFAULT\\_COLUMN\\_VALUE\\_AVAILABLE](https://docs.databricks.com/error-messages/error-classes.html#no_default_column_value_available) |\n| `42613` | Clauses are mutually exclusive. |\n| | [DELTA\\_AMBIGUOUS\\_PATHS\\_IN\\_CREATE\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_ambiguous_paths_in_create_table), [DELTA\\_CANNOT\\_SET\\_LOCATION\\_ON\\_PATH\\_IDENTIFIER](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_set_location_on_path_identifier), [DELTA\\_CLONE\\_AMBIGUOUS\\_TARGET](https://docs.databricks.com/error-messages/error-classes.html#delta_clone_ambiguous_target), [DELTA\\_CLUSTERING\\_WITH\\_ZORDER\\_BY](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_with_zorder_by), [DELTA\\_CLUSTER\\_BY\\_WITH\\_BUCKETING](https://docs.databricks.com/error-messages/error-classes.html#delta_cluster_by_with_bucketing), [DELTA\\_CLUSTER\\_BY\\_WITH\\_PARTITIONED\\_BY](https://docs.databricks.com/error-messages/error-classes.html#delta_cluster_by_with_partitioned_by), [DELTA\\_FILE\\_LIST\\_AND\\_PATTERN\\_STRING\\_CONFLICT](https://docs.databricks.com/error-messages/error-classes.html#delta_file_list_and_pattern_string_conflict), [DELTA\\_OVERWRITE\\_SCHEMA\\_WITH\\_DYNAMIC\\_PARTITION\\_OVERWRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_overwrite_schema_with_dynamic_partition_overwrite), [DELTA\\_PARTITION\\_SCHEMA\\_IN\\_ICEBERG\\_TABLES](https://docs.databricks.com/error-messages/error-classes.html#delta_partition_schema_in_iceberg_tables), [DELTA\\_REPLACE\\_WHERE\\_IN\\_OVERWRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_replace_where_in_overwrite), [DELTA\\_REPLACE\\_WHERE\\_WITH\\_DYNAMIC\\_PARTITION\\_OVERWRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_replace_where_with_dynamic_partition_overwrite), [DELTA\\_REPLACE\\_WHERE\\_WITH\\_FILTER\\_DATA\\_CHANGE\\_UNSET](https://docs.databricks.com/error-messages/error-classes.html#delta_replace_where_with_filter_data_change_unset), [DELTA\\_STARTING\\_VERSION\\_AND\\_TIMESTAMP\\_BOTH\\_SET](https://docs.databricks.com/error-messages/error-classes.html#delta_starting_version_and_timestamp_both_set), [DELTA\\_TABLE\\_LOCATION\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_table_location_mismatch), [DELTA\\_UNSUPPORTED\\_TIME\\_TRAVEL\\_MULTIPLE\\_FORMATS](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_time_travel_multiple_formats), [INCOMPATIBLE\\_JOIN\\_TYPES](https://docs.databricks.com/error-messages/error-classes.html#incompatible_join_types), [INVALID\\_LATERAL\\_JOIN\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#invalid_lateral_join_type), [INVALID\\_QUERY\\_MIXED\\_QUERY\\_PARAMETERS](https://docs.databricks.com/error-messages/error-classes.html#invalid_query_mixed_query_parameters), [INVALID\\_SINGLE\\_VARIANT\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#invalid_single_variant_column), [MANAGED\\_TABLE\\_WITH\\_CRED](https://docs.databricks.com/error-messages/error-classes.html#managed_table_with_cred), [MUTUALLY\\_EXCLUSIVE\\_CLAUSES](https://docs.databricks.com/error-messages/error-classes.html#mutually_exclusive_clauses), [NON\\_LAST\\_MATCHED\\_CLAUSE\\_OMIT\\_CONDITION](https://docs.databricks.com/error-messages/error-classes.html#non_last_matched_clause_omit_condition), [NON\\_LAST\\_NOT\\_MATCHED\\_BY\\_SOURCE\\_CLAUSE\\_OMIT\\_CONDITION](https://docs.databricks.com/error-messages/error-classes.html#non_last_not_matched_by_source_clause_omit_condition), [NON\\_LAST\\_NOT\\_MATCHED\\_BY\\_TARGET\\_CLAUSE\\_OMIT\\_CONDITION](https://docs.databricks.com/error-messages/error-classes.html#non_last_not_matched_by_target_clause_omit_condition), [STDS\\_CONFLICT\\_OPTIONS](https://docs.databricks.com/error-messages/error-classes.html#stds_conflict_options) |\n| `42614` | A duplicate keyword or clause is invalid. |\n| | [DELTA\\_MULTIPLE\\_CDC\\_BOUNDARY](https://docs.databricks.com/error-messages/error-classes.html#delta_multiple_cdc_boundary), [DELTA\\_MULTIPLE\\_CONF\\_FOR\\_SINGLE\\_COLUMN\\_IN\\_BLOOM\\_FILTER](https://docs.databricks.com/error-messages/error-classes.html#delta_multiple_conf_for_single_column_in_bloom_filter), [DUPLICATE\\_CLAUSES](https://docs.databricks.com/error-messages/error-classes.html#duplicate_clauses), [REPEATED\\_CLAUSE](https://docs.databricks.com/error-messages/error-classes.html#repeated_clause) |\n| `42616` | Invalid options specified |\n| | [BIGQUERY\\_OPTIONS\\_ARE\\_MUTUALLY\\_EXCLUSIVE](https://docs.databricks.com/error-messages/error-classes.html#bigquery_options_are_mutually_exclusive), [DELTA\\_ILLEGAL\\_OPTION](https://docs.databricks.com/error-messages/error-classes.html#delta_illegal_option), [DELTA\\_INVALID\\_IDEMPOTENT\\_WRITES\\_OPTIONS](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_idempotent_writes_options), [DELTA\\_UNSET\\_NON\\_EXISTENT\\_PROPERTY](https://docs.databricks.com/error-messages/error-classes.html#delta_unset_non_existent_property), [STDS\\_INVALID\\_OPTION\\_VALUE](https://docs.databricks.com/error-messages/stds-invalid-option-value-error-class.html), [UNSUPPORTED\\_COMMON\\_ANCESTOR\\_LOC\\_FOR\\_FILE\\_STREAM\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_common_ancestor_loc_for_file_stream_source), [UNSUPPORTED\\_INITIAL\\_POSITION\\_AND\\_TRIGGER\\_PAIR\\_FOR\\_KINESIS\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#unsupported_initial_position_and_trigger_pair_for_kinesis_source) |\n| `42617` | The statement string is blank or empty. |\n| | [PARSE\\_EMPTY\\_STATEMENT](https://docs.databricks.com/error-messages/error-classes.html#parse_empty_statement) |\n| `42621` | The check constraint or generated column expression is invalid. |\n| | [DELTA\\_AGGREGATE\\_IN\\_GENERATED\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_aggregate_in_generated_column), [DELTA\\_INVALID\\_GENERATED\\_COLUMN\\_REFERENCES](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_generated_column_references), [DELTA\\_NON\\_BOOLEAN\\_CHECK\\_CONSTRAINT](https://docs.databricks.com/error-messages/error-classes.html#delta_non_boolean_check_constraint), [DELTA\\_UDF\\_IN\\_GENERATED\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_udf_in_generated_column), [DELTA\\_UNSUPPORTED\\_EXPRESSION\\_GENERATED\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_expression_generated_column), [UNSUPPORTED\\_EXPRESSION\\_GENERATED\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#unsupported_expression_generated_column) |\n| `42623` | A DEFAULT clause cannot be specified. |\n| | [GENERATED\\_COLUMN\\_WITH\\_DEFAULT\\_VALUE](https://docs.databricks.com/error-messages/error-classes.html#generated_column_with_default_value), [INVALID\\_DEFAULT\\_VALUE](https://docs.databricks.com/error-messages/invalid-default-value-error-class.html) |\n| `42701` | The same target is specified more than once for assignment in the same SQL statement. |\n| | [DELTA\\_CONFLICT\\_SET\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_conflict_set_column), [DELTA\\_DUPLICATE\\_COLUMNS\\_ON\\_INSERT](https://docs.databricks.com/error-messages/error-classes.html#delta_duplicate_columns_on_insert), [DELTA\\_DUPLICATE\\_COLUMNS\\_ON\\_UPDATE\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_duplicate_columns_on_update_table), [DELTA\\_DUPLICATE\\_DATA\\_SKIPPING\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#delta_duplicate_data_skipping_columns), [DUPLICATE\\_ASSIGNMENTS](https://docs.databricks.com/error-messages/error-classes.html#duplicate_assignments), [EXEC\\_IMMEDIATE\\_DUPLICATE\\_ARGUMENT\\_ALIASES](https://docs.databricks.com/error-messages/error-classes.html#exec_immediate_duplicate_argument_aliases) |\n| `42702` | A column reference is ambiguous, because of duplicate names. |\n| | [AMBIGUOUS\\_COLUMN\\_OR\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#ambiguous_column_or_field), [AMBIGUOUS\\_COLUMN\\_REFERENCE](https://docs.databricks.com/error-messages/error-classes.html#ambiguous_column_reference), [AMBIGUOUS\\_LATERAL\\_COLUMN\\_ALIAS](https://docs.databricks.com/error-messages/error-classes.html#ambiguous_lateral_column_alias), [DELTA\\_AMBIGUOUS\\_PARTITION\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_ambiguous_partition_column), [EXCEPT\\_OVERLAPPING\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#except_overlapping_columns) |\n| `42703` | An undefined column or parameter name was detected. |\n| | [COLUMN\\_NOT\\_DEFINED\\_IN\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#column_not_defined_in_table), [COLUMN\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#column_not_found), [DELTA\\_BLOOM\\_FILTER\\_DROP\\_ON\\_NON\\_EXISTING\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#delta_bloom_filter_drop_on_non_existing_columns), [DELTA\\_CANNOT\\_CREATE\\_BLOOM\\_FILTER\\_NON\\_EXISTING\\_COL](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_create_bloom_filter_non_existing_col), [DELTA\\_CANNOT\\_DROP\\_BLOOM\\_FILTER\\_ON\\_NON\\_INDEXED\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_drop_bloom_filter_on_non_indexed_column), [DELTA\\_CANNOT\\_RESOLVE\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_resolve_column), [DELTA\\_COLUMN\\_MAPPING\\_MAX\\_COLUMN\\_ID\\_NOT\\_SET](https://docs.databricks.com/error-messages/error-classes.html#delta_column_mapping_max_column_id_not_set), [DELTA\\_COLUMN\\_MAPPING\\_MAX\\_COLUMN\\_ID\\_NOT\\_SET\\_CORRECTLY](https://docs.databricks.com/error-messages/error-classes.html#delta_column_mapping_max_column_id_not_set_correctly), [DELTA\\_COLUMN\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_column_not_found), [DELTA\\_COLUMN\\_NOT\\_FOUND\\_IN\\_MERGE](https://docs.databricks.com/error-messages/error-classes.html#delta_column_not_found_in_merge), [DELTA\\_COLUMN\\_NOT\\_FOUND\\_IN\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#delta_column_not_found_in_schema), [DELTA\\_FAILED\\_FIND\\_ATTRIBUTE\\_IN\\_OUTPUT\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#delta_failed_find_attribute_in_output_columns), [DELTA\\_MISSING\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_column), [DELTA\\_MISSING\\_SET\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_set_column), [DELTA\\_PARTITION\\_COLUMN\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_partition_column_not_found), [DELTA\\_ZORDERING\\_COLUMN\\_DOES\\_NOT\\_EXIST](https://docs.databricks.com/error-messages/error-classes.html#delta_zordering_column_does_not_exist), [EXCEPT\\_RESOLVED\\_COLUMNS\\_WITHOUT\\_MATCH](https://docs.databricks.com/error-messages/error-classes.html#except_resolved_columns_without_match), [EXCEPT\\_UNRESOLVED\\_COLUMN\\_IN\\_STRUCT\\_EXPANSION](https://docs.databricks.com/error-messages/error-classes.html#except_unresolved_column_in_struct_expansion), [UNRESOLVED\\_COLUMN](https://docs.databricks.com/error-messages/unresolved-column-error-class.html), [UNRESOLVED\\_FIELD](https://docs.databricks.com/error-messages/unresolved-field-error-class.html), [UNRESOLVED\\_MAP\\_KEY](https://docs.databricks.com/error-messages/unresolved-map-key-error-class.html), [UNRESOLVED\\_USING\\_COLUMN\\_FOR\\_JOIN](https://docs.databricks.com/error-messages/error-classes.html#unresolved_using_column_for_join), [ZORDERBY\\_COLUMN\\_DOES\\_NOT\\_EXIST](https://docs.databricks.com/error-messages/error-classes.html#zorderby_column_does_not_exist) |\n| `42704` | An undefined object or constraint name was detected. |\n| | [AMBIGUOUS\\_REFERENCE](https://docs.databricks.com/error-messages/error-classes.html#ambiguous_reference), [CANNOT\\_RESOLVE\\_DATAFRAME\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#cannot_resolve_dataframe_column), [CANNOT\\_RESOLVE\\_STAR\\_EXPAND](https://docs.databricks.com/error-messages/error-classes.html#cannot_resolve_star_expand), [CODEC\\_SHORT\\_NAME\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#codec_short_name_not_found), [COLLATION\\_INVALID\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#collation_invalid_name), [DATA\\_SOURCE\\_NOT\\_EXIST](https://docs.databricks.com/error-messages/error-classes.html#data_source_not_exist), [DEFAULT\\_DATABASE\\_NOT\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#default_database_not_exists), [DELTA\\_COLUMN\\_PATH\\_NOT\\_NESTED](https://docs.databricks.com/error-messages/error-classes.html#delta_column_path_not_nested), [DELTA\\_CONSTRAINT\\_DOES\\_NOT\\_EXIST](https://docs.databricks.com/error-messages/error-classes.html#delta_constraint_does_not_exist), [DELTA\\_SHARING\\_CURRENT\\_RECIPIENT\\_PROPERTY\\_UNDEFINED](https://docs.databricks.com/error-messages/error-classes.html#delta_sharing_current_recipient_property_undefined), [ENCODER\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#encoder_not_found), [FIELD\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#field_not_found), [INDEX\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#index_not_found), [SCHEMA\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#schema_not_found), [UC\\_VOLUME\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#uc_volume_not_found), [UNRECOGNIZED\\_SQL\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#unrecognized_sql_type) |\n| `42710` | A duplicate object or constraint name was detected. |\n| | [ALTER\\_TABLE\\_COLUMN\\_DESCRIPTOR\\_DUPLICATE](https://docs.databricks.com/error-messages/error-classes.html#alter_table_column_descriptor_duplicate), [CREATE\\_TABLE\\_COLUMN\\_DESCRIPTOR\\_DUPLICATE](https://docs.databricks.com/error-messages/error-classes.html#create_table_column_descriptor_duplicate), [DATA\\_SOURCE\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#data_source_already_exists), [DELTA\\_CONSTRAINT\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#delta_constraint_already_exists), [DUPLICATED\\_METRICS\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#duplicated_metrics_name), [FIELDS\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#fields_already_exists), [FOUND\\_MULTIPLE\\_DATA\\_SOURCES](https://docs.databricks.com/error-messages/error-classes.html#found_multiple_data_sources), [INDEX\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#index_already_exists), [LOCATION\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#location_already_exists), [MULTIPLE\\_XML\\_DATA\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#multiple_xml_data_source) |\n| `42711` | A duplicate column name was detected in the object definition or ALTER TABLE statement. |\n| | [COLUMN\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#column_already_exists), [DELTA\\_DUPLICATE\\_COLUMNS\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_duplicate_columns_found), [DELTA\\_TABLE\\_ALREADY\\_CONTAINS\\_CDC\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#delta_table_already_contains_cdc_columns), [DUPLICATE\\_ROUTINE\\_RETURNS\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#duplicate_routine_returns_columns) |\n| `42713` | A duplicate object was detected in a list or is the same as an existing object. |\n| | [DUPLICATED\\_FIELD\\_NAME\\_IN\\_ARROW\\_STRUCT](https://docs.databricks.com/error-messages/error-classes.html#duplicated_field_name_in_arrow_struct), [STATIC\\_PARTITION\\_COLUMN\\_IN\\_INSERT\\_COLUMN\\_LIST](https://docs.databricks.com/error-messages/error-classes.html#static_partition_column_in_insert_column_list) |\n| `42723` | A routine with the same signature already exists in the schema, module, or compound block where it is defined. |\n| | [ROUTINE\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#routine_already_exists), [VARIABLE\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#variable_already_exists) |\n| `42734` | A duplicate parameter-name, SQL variable name, label, or condition-name was detected. |\n| | [COLUMN\\_MASKS\\_DUPLICATE\\_USING\\_COLUMN\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#column_masks_duplicate_using_column_name), [COLUMN\\_MASKS\\_USING\\_COLUMN\\_NAME\\_SAME\\_AS\\_TARGET\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#column_masks_using_column_name_same_as_target_column), [DUPLICATE\\_ROUTINE\\_PARAMETER\\_NAMES](https://docs.databricks.com/error-messages/error-classes.html#duplicate_routine_parameter_names), [ROW\\_LEVEL\\_SECURITY\\_DUPLICATE\\_COLUMN\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_duplicate_column_name) |\n| `4274K` | Invalid use of a named argument when invoking a routine. |\n| | [DUPLICATE\\_ROUTINE\\_PARAMETER\\_ASSIGNMENT](https://docs.databricks.com/error-messages/duplicate-routine-parameter-assignment-error-class.html), [NAMED\\_PARAMETERS\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#named_parameters_not_supported), [READ\\_FILES\\_AMBIGUOUS\\_ROUTINE\\_PARAMETERS](https://docs.databricks.com/error-messages/error-classes.html#read_files_ambiguous_routine_parameters), [READ\\_TVF\\_UNEXPECTED\\_REQUIRED\\_PARAMETER](https://docs.databricks.com/error-messages/error-classes.html#read_tvf_unexpected_required_parameter), [REQUIRED\\_PARAMETER\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#required_parameter_not_found), [UNEXPECTED\\_POSITIONAL\\_ARGUMENT](https://docs.databricks.com/error-messages/error-classes.html#unexpected_positional_argument), [UNKNOWN\\_POSITIONAL\\_ARGUMENT](https://docs.databricks.com/error-messages/error-classes.html#unknown_positional_argument), [UNRECOGNIZED\\_PARAMETER\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#unrecognized_parameter_name) |\n| `42802` | The number of target values is not the same as the number of source values. |\n| | [ASSIGNMENT\\_ARITY\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#assignment_arity_mismatch), [DELTA\\_INSERT\\_COLUMN\\_ARITY\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_insert_column_arity_mismatch), [DELTA\\_INSERT\\_COLUMN\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_insert_column_mismatch), [STATE\\_STORE\\_CANNOT\\_REMOVE\\_DEFAULT\\_COLUMN\\_FAMILY](https://docs.databricks.com/error-messages/error-classes.html#state_store_cannot_remove_default_column_family), [STATE\\_STORE\\_MULTIPLE\\_VALUES\\_PER\\_KEY](https://docs.databricks.com/error-messages/error-classes.html#state_store_multiple_values_per_key), [UDTF\\_ALIAS\\_NUMBER\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#udtf_alias_number_mismatch), [UDTF\\_INVALID\\_ALIAS\\_IN\\_REQUESTED\\_ORDERING\\_STRING\\_FROM\\_ANALYZE\\_METHOD](https://docs.databricks.com/error-messages/error-classes.html#udtf_invalid_alias_in_requested_ordering_string_from_analyze_method), [UDTF\\_INVALID\\_REQUESTED\\_SELECTED\\_EXPRESSION\\_FROM\\_ANALYZE\\_METHOD\\_REQUIRES\\_ALIAS](https://docs.databricks.com/error-messages/error-classes.html#udtf_invalid_requested_selected_expression_from_analyze_method_requires_alias) |\n| `42803` | A column reference in the SELECT or HAVING clause is invalid, because it is not a grouping column; or a column reference in the GROUP BY clause is invalid. |\n| | [GROUPING\\_COLUMN\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#grouping_column_mismatch), [GROUPING\\_ID\\_COLUMN\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#grouping_id_column_mismatch), [MISSING\\_AGGREGATION](https://docs.databricks.com/error-messages/missing-aggregation-error-class.html), [MISSING\\_GROUP\\_BY](https://docs.databricks.com/error-messages/error-classes.html#missing_group_by), [UNRESOLVED\\_ALL\\_IN\\_GROUP\\_BY](https://docs.databricks.com/error-messages/error-classes.html#unresolved_all_in_group_by) |\n| `42805` | An integer in the ORDER BY clause does not identify a column of the result table. |\n| | [GROUP\\_BY\\_POS\\_OUT\\_OF\\_RANGE](https://docs.databricks.com/error-messages/error-classes.html#group_by_pos_out_of_range), [ORDER\\_BY\\_POS\\_OUT\\_OF\\_RANGE](https://docs.databricks.com/error-messages/error-classes.html#order_by_pos_out_of_range) |\n| `42806` | A value cannot be assigned to a variable, because the data types are not compatible. |\n| | [DELTA\\_MERGE\\_INCOMPATIBLE\\_DECIMAL\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#delta_merge_incompatible_decimal_type) |\n| `42807` | The data-change statement is not permitted on this object. |\n| | [DELTA\\_CHANGE\\_TABLE\\_FEED\\_DISABLED](https://docs.databricks.com/error-messages/error-classes.html#delta_change_table_feed_disabled), [DELTA\\_UNSUPPORTED\\_WRITES\\_STAGED\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_writes_staged_table) |\n| `42809` | The identified object is not the type of object to which the statement applies. |\n| | [DELTA\\_CANNOT\\_DESCRIBE\\_VIEW\\_HISTORY](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_describe_view_history), [DELTA\\_CANNOT\\_MODIFY\\_APPEND\\_ONLY](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_modify_append_only), [DELTA\\_SHOW\\_PARTITION\\_IN\\_NON\\_PARTITIONED\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_show_partition_in_non_partitioned_table), [DELTA\\_TABLE\\_NOT\\_SUPPORTED\\_IN\\_OP](https://docs.databricks.com/error-messages/error-classes.html#delta_table_not_supported_in_op), [DELTA\\_UNSUPPORTED\\_DESCRIBE\\_DETAIL\\_VIEW](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_describe_detail_view), [EXPECT\\_PERMANENT\\_VIEW\\_NOT\\_TEMP](https://docs.databricks.com/error-messages/error-classes.html#expect_permanent_view_not_temp), [EXPECT\\_TABLE\\_NOT\\_VIEW](https://docs.databricks.com/error-messages/expect-table-not-view-error-class.html), [EXPECT\\_VIEW\\_NOT\\_TABLE](https://docs.databricks.com/error-messages/expect-view-not-table-error-class.html), [FORBIDDEN\\_OPERATION](https://docs.databricks.com/error-messages/error-classes.html#forbidden_operation), [INVALID\\_DEST\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#invalid_dest_catalog), [INVALID\\_SOURCE\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#invalid_source_catalog), [INVALID\\_UPGRADE\\_SYNTAX](https://docs.databricks.com/error-messages/error-classes.html#invalid_upgrade_syntax), [NOT\\_A\\_PARTITIONED\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#not_a_partitioned_table), [UNSUPPORTED\\_INSERT](https://docs.databricks.com/error-messages/unsupported-insert-error-class.html), [WRONG\\_COMMAND\\_FOR\\_OBJECT\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#wrong_command_for_object_type) |\n| `42816` | A datetime value or duration in an expression is invalid. |\n| | [DELTA\\_TIMESTAMP\\_GREATER\\_THAN\\_COMMIT](https://docs.databricks.com/error-messages/error-classes.html#delta_timestamp_greater_than_commit), [DELTA\\_TIMESTAMP\\_INVALID](https://docs.databricks.com/error-messages/error-classes.html#delta_timestamp_invalid) |\n| `42818` | The operands of an operator or function are not compatible or comparable. |\n| | [INCOMPARABLE\\_PIVOT\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#incomparable_pivot_column) |\n| `42822` | An expression in the ORDER BY clause or GROUP BY clause is not valid. |\n| | [EXPRESSION\\_TYPE\\_IS\\_NOT\\_ORDERABLE](https://docs.databricks.com/error-messages/error-classes.html#expression_type_is_not_orderable), [GROUP\\_EXPRESSION\\_TYPE\\_IS\\_NOT\\_ORDERABLE](https://docs.databricks.com/error-messages/error-classes.html#group_expression_type_is_not_orderable) |\n| `42823` | Multiple columns are returned from a subquery that only allows one column. |\n| | [INVALID\\_SUBQUERY\\_EXPRESSION](https://docs.databricks.com/error-messages/invalid-subquery-expression-error-class.html) |\n| `42825` | The rows of UNION, INTERSECT, EXCEPT, or VALUES do not have compatible columns. |\n| | [CANNOT\\_MERGE\\_INCOMPATIBLE\\_DATA\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#cannot_merge_incompatible_data_type), [INCOMPATIBLE\\_COLUMN\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#incompatible_column_type) |\n| `42826` | The rows of UNION, INTERSECT, EXCEPT, or VALUES do not have the same number of columns. |\n| | [NUM\\_COLUMNS\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#num_columns_mismatch), [NUM\\_TABLE\\_VALUE\\_ALIASES\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#num_table_value_aliases_mismatch) |\n| `42830` | The foreign key does not conform to the description of the parent key. |\n| | [FOREIGN\\_KEY\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#foreign_key_mismatch) |\n| `42832` | The operation is not allowed on system objects. |\n| | [BUILT\\_IN\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#built_in_catalog), [CANNOT\\_DELETE\\_SYSTEM\\_OWNED](https://docs.databricks.com/error-messages/error-classes.html#cannot_delete_system_owned), [EVENT\\_LOG\\_UNSUPPORTED\\_TABLE\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#event_log_unsupported_table_type), [MODIFY\\_BUILTIN\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#modify_builtin_catalog), [SAMPLE\\_TABLE\\_PERMISSIONS](https://docs.databricks.com/error-messages/error-classes.html#sample_table_permissions) |\n| `42837` | The column cannot be altered, because its attributes are not compatible with the current column attributes. |\n| | [DELTA\\_ALTER\\_TABLE\\_CHANGE\\_COL\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_alter_table_change_col_not_supported) |\n| `42845` | An invalid use of a NOT DETERMINISTIC or EXTERNAL ACTION function was detected. |\n| | [AGGREGATE\\_FUNCTION\\_WITH\\_NONDETERMINISTIC\\_EXPRESSION](https://docs.databricks.com/error-messages/error-classes.html#aggregate_function_with_nondeterministic_expression) |\n| `42846` | Cast from source type to target type is not supported. |\n| | [CANNOT\\_CAST\\_DATATYPE](https://docs.databricks.com/error-messages/error-classes.html#cannot_cast_datatype), [CANNOT\\_CONVERT\\_PROTOBUF\\_FIELD\\_TYPE\\_TO\\_SQL\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#cannot_convert_protobuf_field_type_to_sql_type), [CANNOT\\_CONVERT\\_PROTOBUF\\_MESSAGE\\_TYPE\\_TO\\_SQL\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#cannot_convert_protobuf_message_type_to_sql_type), [CANNOT\\_CONVERT\\_SQL\\_TYPE\\_TO\\_PROTOBUF\\_FIELD\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#cannot_convert_sql_type_to_protobuf_field_type), [CANNOT\\_CONVERT\\_SQL\\_VALUE\\_TO\\_PROTOBUF\\_ENUM\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#cannot_convert_sql_value_to_protobuf_enum_type), [CANNOT\\_UP\\_CAST\\_DATATYPE](https://docs.databricks.com/error-messages/error-classes.html#cannot_up_cast_datatype), [DELTA\\_UPDATE\\_SCHEMA\\_MISMATCH\\_EXPRESSION](https://docs.databricks.com/error-messages/error-classes.html#delta_update_schema_mismatch_expression), [EXPRESSION\\_DECODING\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#expression_decoding_failed), [EXPRESSION\\_ENCODING\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#expression_encoding_failed), [UNEXPECTED\\_SERIALIZER\\_FOR\\_CLASS](https://docs.databricks.com/error-messages/error-classes.html#unexpected_serializer_for_class) |\n| `42852` | The privileges specified in GRANT or REVOKE are invalid or inconsistent. (For example, GRANT ALTER on a view.) |\n| | [INVALID\\_PRIVILEGE](https://docs.databricks.com/error-messages/error-classes.html#invalid_privilege) |\n| `42883` | No routine was found with a matching signature. |\n| | [ROUTINE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#routine_not_found), [UNRESOLVABLE\\_TABLE\\_VALUED\\_FUNCTION](https://docs.databricks.com/error-messages/error-classes.html#unresolvable_table_valued_function), [UNRESOLVED\\_ROUTINE](https://docs.databricks.com/error-messages/unresolved-routine-error-class.html), [UNRESOLVED\\_VARIABLE](https://docs.databricks.com/error-messages/error-classes.html#unresolved_variable), [VARIABLE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#variable_not_found) |\n| `42887` | The function or table-reference is not valid in the context where it occurs. |\n| | [CYCLIC\\_FUNCTION\\_REFERENCE](https://docs.databricks.com/error-messages/error-classes.html#cyclic_function_reference), [DELTA\\_SHARING\\_INVALID\\_OP\\_IN\\_EXTERNAL\\_SHARED\\_VIEW](https://docs.databricks.com/error-messages/error-classes.html#delta_sharing_invalid_op_in_external_shared_view), [INVALID\\_CURRENT\\_RECIPIENT\\_USAGE](https://docs.databricks.com/error-messages/error-classes.html#invalid_current_recipient_usage), [NOT\\_A\\_SCALAR\\_FUNCTION](https://docs.databricks.com/error-messages/error-classes.html#not_a_scalar_function), [NOT\\_A\\_TABLE\\_FUNCTION](https://docs.databricks.com/error-messages/error-classes.html#not_a_table_function) |\n| `42891` | A duplicate constraint already exists. |\n| | [MULTIPLE\\_MATCHING\\_CONSTRAINTS](https://docs.databricks.com/error-messages/error-classes.html#multiple_matching_constraints) |\n| `428C4` | The number of elements on each side of the predicate operator is not the same. |\n| | [UNPIVOT\\_VALUE\\_SIZE\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#unpivot_value_size_mismatch) |\n| `428EK` | The schema qualifier is not valid. |\n| | [TEMP\\_VIEW\\_NAME\\_TOO\\_MANY\\_NAME\\_PARTS](https://docs.databricks.com/error-messages/error-classes.html#temp_view_name_too_many_name_parts) |\n| `428FR` | A column cannot be altered as specified. |\n| | [CANNOT\\_ALTER\\_PARTITION\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#cannot_alter_partition_column) |\n| `428FT` | The partitioning clause specified on CREATE or ALTER is not valid. |\n| | [DELTA\\_CANNOT\\_USE\\_ALL\\_COLUMNS\\_FOR\\_PARTITION](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_use_all_columns_for_partition), [PARTITIONS\\_ALREADY\\_EXIST](https://docs.databricks.com/error-messages/error-classes.html#partitions_already_exist), [PARTITIONS\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#partitions_not_found) |\n| `428GU` | A table must include at least one column that is not implicitly hidden. |\n| | [DELTA\\_EMPTY\\_DATA](https://docs.databricks.com/error-messages/error-classes.html#delta_empty_data), [DELTA\\_READ\\_TABLE\\_WITHOUT\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#delta_read_table_without_columns), [DELTA\\_TARGET\\_TABLE\\_FINAL\\_SCHEMA\\_EMPTY](https://docs.databricks.com/error-messages/error-classes.html#delta_target_table_final_schema_empty) |\n| `428H2` | Data type is not supported in the context where it is being used. |\n| | [EXCEPT\\_NESTED\\_COLUMN\\_INVALID\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#except_nested_column_invalid_type) |\n| `428HD` | The statement cannot be processed because a column mask cannot be applied or the definition of the mask conflicts with the statement. |\n| | [QUERIED\\_TABLE\\_INCOMPATIBLE\\_WITH\\_COLUMN\\_MASK\\_POLICY](https://docs.databricks.com/error-messages/queried-table-incompatible-with-column-mask-policy-error-class.html), [QUERIED\\_TABLE\\_INCOMPATIBLE\\_WITH\\_ROW\\_LEVEL\\_SECURITY\\_POLICY](https://docs.databricks.com/error-messages/queried-table-incompatible-with-row-level-security-policy-error-class.html) |\n| `42902` | The object of the INSERT, UPDATE, or DELETE is also identified (possibly implicitly through a view) in a FROM clause. |\n| | [UNSUPPORTED\\_OVERWRITE](https://docs.databricks.com/error-messages/unsupported-overwrite-error-class.html) |\n| `42903` | Invalid use of an aggregate function or OLAP function. |\n| | [DELTA\\_AGGREGATION\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#delta_aggregation_not_supported), [GROUP\\_BY\\_AGGREGATE](https://docs.databricks.com/error-messages/group-by-aggregate-error-class.html), [GROUP\\_BY\\_POS\\_AGGREGATE](https://docs.databricks.com/error-messages/error-classes.html#group_by_pos_aggregate), [INVALID\\_WHERE\\_CONDITION](https://docs.databricks.com/error-messages/error-classes.html#invalid_where_condition) |\n| `42908` | The statement does not include a required column list. |\n| | [DELTA\\_CLUSTER\\_BY\\_SCHEMA\\_NOT\\_PROVIDED](https://docs.databricks.com/error-messages/error-classes.html#delta_cluster_by_schema_not_provided), [DELTA\\_SCHEMA\\_NOT\\_PROVIDED](https://docs.databricks.com/error-messages/error-classes.html#delta_schema_not_provided), [SPECIFY\\_CLUSTER\\_BY\\_WITH\\_BUCKETING\\_IS\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#specify_cluster_by_with_bucketing_is_not_allowed), [SPECIFY\\_CLUSTER\\_BY\\_WITH\\_PARTITIONED\\_BY\\_IS\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#specify_cluster_by_with_partitioned_by_is_not_allowed) |\n| `42939` | The name cannot be used, because the specified identifier is reserved for system use. |\n| | [DELTA\\_CANNOT\\_CHANGE\\_PROVIDER](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_change_provider), [DELTA\\_CANNOT\\_MODIFY\\_TABLE\\_PROPERTY](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_modify_table_property), [RESERVED\\_CDC\\_COLUMNS\\_ON\\_WRITE](https://docs.databricks.com/error-messages/error-classes.html#reserved_cdc_columns_on_write), [ROUTINE\\_USES\\_SYSTEM\\_RESERVED\\_CLASS\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#routine_uses_system_reserved_class_name) |\n| `42996` | A specified column may not be used in a partition key. |\n| | [DELTA\\_INVALID\\_PARTITION\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_partition_column), [DELTA\\_INVALID\\_PARTITION\\_COLUMN\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_partition_column_name), [DELTA\\_INVALID\\_PARTITION\\_COLUMN\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_partition_column_type) |\n| `429BB` | The data type of a column, parameter, or SQL variable is not supported. |\n| | [CANNOT\\_RECOGNIZE\\_HIVE\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#cannot_recognize_hive_type) |\n| `429BQ` | The specified alter of the data type or attribute is not allowed. |\n| | [DELTA\\_AMBIGUOUS\\_DATA\\_TYPE\\_CHANGE](https://docs.databricks.com/error-messages/error-classes.html#delta_ambiguous_data_type_change), [DELTA\\_CANNOT\\_CHANGE\\_DATA\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_change_data_type), [DELTA\\_CANNOT\\_UPDATE\\_ARRAY\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_update_array_field), [DELTA\\_CANNOT\\_UPDATE\\_MAP\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_update_map_field), [DELTA\\_CANNOT\\_UPDATE\\_OTHER\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_update_other_field), [DELTA\\_CANNOT\\_UPDATE\\_STRUCT\\_FIELD](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_update_struct_field) |\n| `42K01` | data type not fully specified |\n| | [DATATYPE\\_MISSING\\_SIZE](https://docs.databricks.com/error-messages/error-classes.html#datatype_missing_size), [INCOMPLETE\\_TYPE\\_DEFINITION](https://docs.databricks.com/error-messages/incomplete-type-definition-error-class.html) |\n| `42K02` | data source not found |\n| | [DATA\\_SOURCE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#data_source_not_found), [STREAM\\_NOT\\_FOUND\\_FOR\\_KINESIS\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#stream_not_found_for_kinesis_source) |\n| `42K03` | File not found |\n| | [BATCH\\_METADATA\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#batch_metadata_not_found), [CANNOT\\_LOAD\\_PROTOBUF\\_CLASS](https://docs.databricks.com/error-messages/error-classes.html#cannot_load_protobuf_class), [CLOUD\\_FILE\\_SOURCE\\_FILE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#cloud_file_source_file_not_found), [DATA\\_SOURCE\\_TABLE\\_SCHEMA\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#data_source_table_schema_mismatch), [DEFAULT\\_FILE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#default_file_not_found), [DELTA\\_CHANGE\\_DATA\\_FILE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_change_data_file_not_found), [DELTA\\_CHECKPOINT\\_NON\\_EXIST\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_checkpoint_non_exist_table), [DELTA\\_CREATE\\_EXTERNAL\\_TABLE\\_WITHOUT\\_TXN\\_LOG](https://docs.databricks.com/error-messages/error-classes.html#delta_create_external_table_without_txn_log), [DELTA\\_DELETED\\_PARQUET\\_FILE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_deleted_parquet_file_not_found), [DELTA\\_EMPTY\\_DIRECTORY](https://docs.databricks.com/error-messages/error-classes.html#delta_empty_directory), [DELTA\\_FILE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_file_not_found), [DELTA\\_FILE\\_NOT\\_FOUND\\_DETAILED](https://docs.databricks.com/error-messages/error-classes.html#delta_file_not_found_detailed), [DELTA\\_FILE\\_OR\\_DIR\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_file_or_dir_not_found), [DELTA\\_FILE\\_TO\\_OVERWRITE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_file_to_overwrite_not_found), [DELTA\\_LOG\\_FILE\\_NOT\\_FOUND\\_FOR\\_STREAMING\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#delta_log_file_not_found_for_streaming_source), [DELTA\\_PATH\\_DOES\\_NOT\\_EXIST](https://docs.databricks.com/error-messages/error-classes.html#delta_path_does_not_exist), [DELTA\\_SHALLOW\\_CLONE\\_FILE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_shallow_clone_file_not_found), [DELTA\\_TRUNCATED\\_TRANSACTION\\_LOG](https://docs.databricks.com/error-messages/error-classes.html#delta_truncated_transaction_log), [LOAD\\_DATA\\_PATH\\_NOT\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#load_data_path_not_exists), [PATH\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#path_not_found), [RENAME\\_SRC\\_PATH\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#rename_src_path_not_found), [STDS\\_FAILED\\_TO\\_READ\\_STATE\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#stds_failed_to_read_state_schema), [STREAMING\\_STATEFUL\\_OPERATOR\\_NOT\\_MATCH\\_IN\\_STATE\\_METADATA](https://docs.databricks.com/error-messages/error-classes.html#streaming_stateful_operator_not_match_in_state_metadata) |\n| `42K04` | Duplicate file |\n| | [DELTA\\_FILE\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#delta_file_already_exists), [DELTA\\_LOG\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#delta_log_already_exists), [DELTA\\_PATH\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#delta_path_exists), [FAILED\\_RENAME\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#failed_rename_path), [FILE\\_IN\\_STAGING\\_PATH\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#file_in_staging_path_already_exists), [KINESIS\\_FETCHED\\_SHARD\\_LESS\\_THAN\\_TRACKED\\_SHARD](https://docs.databricks.com/error-messages/error-classes.html#kinesis_fetched_shard_less_than_tracked_shard), [PARTITION\\_LOCATION\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#partition_location_already_exists), [PATH\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#path_already_exists) |\n| `42K05` | Name is not valid |\n| | [CLEANROOM\\_INVALID\\_SHARED\\_DATA\\_OBJECT\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#cleanroom_invalid_shared_data_object_name), [COLUMN\\_MASKS\\_MULTI\\_PART\\_TARGET\\_COLUMN\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#column_masks_multi_part_target_column_name), [COLUMN\\_MASKS\\_MULTI\\_PART\\_USING\\_COLUMN\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#column_masks_multi_part_using_column_name), [DELTA\\_INVALID\\_CHARACTERS\\_IN\\_COLUMN\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_characters_in_column_name), [DELTA\\_INVALID\\_CHARACTERS\\_IN\\_COLUMN\\_NAMES](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_characters_in_column_names), [DELTA\\_INVALID\\_COLUMN\\_NAMES\\_WHEN\\_REMOVING\\_COLUMN\\_MAPPING](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_column_names_when_removing_column_mapping), [DELTA\\_NESTED\\_FIELDS\\_NEED\\_RENAME](https://docs.databricks.com/error-messages/error-classes.html#delta_nested_fields_need_rename), [DELTA\\_NON\\_SINGLE\\_PART\\_NAMESPACE\\_FOR\\_CATALOG](https://docs.databricks.com/error-messages/error-classes.html#delta_non_single_part_namespace_for_catalog), [INVALID\\_EMPTY\\_LOCATION](https://docs.databricks.com/error-messages/error-classes.html#invalid_empty_location), [REQUIRES\\_SINGLE\\_PART\\_NAMESPACE](https://docs.databricks.com/error-messages/error-classes.html#requires_single_part_namespace), [ROW\\_LEVEL\\_SECURITY\\_MULTI\\_PART\\_COLUMN\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#row_level_security_multi_part_column_name) |\n| `42K06` | Invalid type for options |\n| | [DELTA\\_PROTOCOL\\_PROPERTY\\_NOT\\_INT](https://docs.databricks.com/error-messages/error-classes.html#delta_protocol_property_not_int), [INVALID\\_OPTIONS](https://docs.databricks.com/error-messages/invalid-options-error-class.html) |\n| `42K07` | Not a valid schema literal |\n| | [INVALID\\_SCHEMA](https://docs.databricks.com/error-messages/invalid-schema-error-class.html) |\n| `42K08` | Not a constant |\n| | [ARGUMENT\\_NOT\\_CONSTANT](https://docs.databricks.com/error-messages/error-classes.html#argument_not_constant), [INVALID\\_SQL\\_ARG](https://docs.databricks.com/error-messages/error-classes.html#invalid_sql_arg), [NON\\_FOLDABLE\\_ARGUMENT](https://docs.databricks.com/error-messages/error-classes.html#non_foldable_argument), [NON\\_LITERAL\\_PIVOT\\_VALUES](https://docs.databricks.com/error-messages/error-classes.html#non_literal_pivot_values), [SEED\\_EXPRESSION\\_IS\\_UNFOLDABLE](https://docs.databricks.com/error-messages/error-classes.html#seed_expression_is_unfoldable) |\n| `42K09` | Data type mismatch |\n| | [DATATYPE\\_MISMATCH](https://docs.databricks.com/error-messages/datatype-mismatch-error-class.html), [DELTA\\_GENERATED\\_COLUMNS\\_DATA\\_TYPE\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_generated_columns_data_type_mismatch), [DELTA\\_GENERATED\\_COLUMNS\\_EXPR\\_TYPE\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_generated_columns_expr_type_mismatch), [DELTA\\_GENERATED\\_COLUMN\\_UPDATE\\_TYPE\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_generated_column_update_type_mismatch), [DELTA\\_MERGE\\_INCOMPATIBLE\\_DATATYPE](https://docs.databricks.com/error-messages/error-classes.html#delta_merge_incompatible_datatype), [DELTA\\_NOT\\_NULL\\_COLUMN\\_NOT\\_FOUND\\_IN\\_STRUCT](https://docs.databricks.com/error-messages/error-classes.html#delta_not_null_column_not_found_in_struct), [EVENT\\_TIME\\_IS\\_NOT\\_ON\\_TIMESTAMP\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#event_time_is_not_on_timestamp_type), [INVALID\\_VARIABLE\\_TYPE\\_FOR\\_QUERY\\_EXECUTE\\_IMMEDIATE](https://docs.databricks.com/error-messages/error-classes.html#invalid_variable_type_for_query_execute_immediate), [PIVOT\\_VALUE\\_DATA\\_TYPE\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#pivot_value_data_type_mismatch), [UNEXPECTED\\_INPUT\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#unexpected_input_type), [UNPIVOT\\_VALUE\\_DATA\\_TYPE\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#unpivot_value_data_type_mismatch) |\n| `42K0A` | Invalid UNPIVOT clause |\n| | [UNPIVOT\\_REQUIRES\\_ATTRIBUTES](https://docs.databricks.com/error-messages/error-classes.html#unpivot_requires_attributes), [UNPIVOT\\_REQUIRES\\_VALUE\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#unpivot_requires_value_columns) |\n| `42K0B` | Legacy feature blocked |\n| | [INCONSISTENT\\_BEHAVIOR\\_CROSS\\_VERSION](https://docs.databricks.com/error-messages/inconsistent-behavior-cross-version-error-class.html) |\n| `42K0C` | Ambiguous reference to constraint |\n| | [AMBIGUOUS\\_CONSTRAINT](https://docs.databricks.com/error-messages/error-classes.html#ambiguous_constraint), [CANNOT\\_DROP\\_AMBIGUOUS\\_CONSTRAINT](https://docs.databricks.com/error-messages/error-classes.html#cannot_drop_ambiguous_constraint) |\n| `42K0D` | Invalid lambda function |\n| | [INVALID\\_LAMBDA\\_FUNCTION\\_CALL](https://docs.databricks.com/error-messages/invalid-lambda-function-call-error-class.html) |\n| `42K0E` | An expression is not valid in teh context it is used |\n| | [INVALID\\_LIMIT\\_LIKE\\_EXPRESSION](https://docs.databricks.com/error-messages/invalid-limit-like-expression-error-class.html), [INVALID\\_NON\\_DETERMINISTIC\\_EXPRESSIONS](https://docs.databricks.com/error-messages/error-classes.html#invalid_non_deterministic_expressions), [INVALID\\_OBSERVED\\_METRICS](https://docs.databricks.com/error-messages/invalid-observed-metrics-error-class.html), [INVALID\\_TIME\\_TRAVEL\\_SPEC](https://docs.databricks.com/error-messages/error-classes.html#invalid_time_travel_spec), [INVALID\\_TIME\\_TRAVEL\\_TIMESTAMP\\_EXPR](https://docs.databricks.com/error-messages/invalid-time-travel-timestamp-expr-error-class.html), [JOIN\\_CONDITION\\_IS\\_NOT\\_BOOLEAN\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#join_condition_is_not_boolean_type), [MULTIPLE\\_TIME\\_TRAVEL\\_SPEC](https://docs.databricks.com/error-messages/error-classes.html#multiple_time_travel_spec), [MULTI\\_SOURCES\\_UNSUPPORTED\\_FOR\\_EXPRESSION](https://docs.databricks.com/error-messages/error-classes.html#multi_sources_unsupported_for_expression), [NO\\_MERGE\\_ACTION\\_SPECIFIED](https://docs.databricks.com/error-messages/error-classes.html#no_merge_action_specified), [ONLY\\_SECRET\\_FUNCTION\\_SUPPORTED\\_HERE](https://docs.databricks.com/error-messages/error-classes.html#only_secret_function_supported_here), [SECRET\\_FUNCTION\\_INVALID\\_LOCATION](https://docs.databricks.com/error-messages/error-classes.html#secret_function_invalid_location), [UNSUPPORTED\\_EXPR\\_FOR\\_OPERATOR](https://docs.databricks.com/error-messages/error-classes.html#unsupported_expr_for_operator), [UNSUPPORTED\\_EXPR\\_FOR\\_PARAMETER](https://docs.databricks.com/error-messages/error-classes.html#unsupported_expr_for_parameter), [UNSUPPORTED\\_GENERATOR](https://docs.databricks.com/error-messages/unsupported-generator-error-class.html), [UNSUPPORTED\\_GROUPING\\_EXPRESSION](https://docs.databricks.com/error-messages/error-classes.html#unsupported_grouping_expression), [UNSUPPORTED\\_MERGE\\_CONDITION](https://docs.databricks.com/error-messages/unsupported-merge-condition-error-class.html), [UNTYPED\\_SCALA\\_UDF](https://docs.databricks.com/error-messages/error-classes.html#untyped_scala_udf), [WINDOW\\_FUNCTION\\_AND\\_FRAME\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#window_function_and_frame_mismatch) |\n| `42K0F` | A persisted object cannot reference a temporary object. |\n| | [INVALID\\_TEMP\\_OBJ\\_REFERENCE](https://docs.databricks.com/error-messages/error-classes.html#invalid_temp_obj_reference) |\n| `42K0G` | A protobuf is invalid |\n| | [PROTOBUF\\_DEPENDENCY\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#protobuf_dependency_not_found), [PROTOBUF\\_DESCRIPTOR\\_FILE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#protobuf_descriptor_file_not_found), [PROTOBUF\\_FIELD\\_MISSING](https://docs.databricks.com/error-messages/error-classes.html#protobuf_field_missing), [PROTOBUF\\_FIELD\\_MISSING\\_IN\\_SQL\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#protobuf_field_missing_in_sql_schema), [PROTOBUF\\_FIELD\\_TYPE\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#protobuf_field_type_mismatch), [PROTOBUF\\_MESSAGE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#protobuf_message_not_found), [PROTOBUF\\_TYPE\\_NOT\\_SUPPORT](https://docs.databricks.com/error-messages/error-classes.html#protobuf_type_not_support), [RECURSIVE\\_PROTOBUF\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#recursive_protobuf_schema), [SCHEMA\\_REGISTRY\\_CONFIGURATION\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#schema_registry_configuration_error), [UNABLE\\_TO\\_CONVERT\\_TO\\_PROTOBUF\\_MESSAGE\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#unable_to_convert_to_protobuf_message_type), [UNKNOWN\\_PROTOBUF\\_MESSAGE\\_TYPE](https://docs.databricks.com/error-messages/error-classes.html#unknown_protobuf_message_type) |\n| `42K0H` | A cyclic invocation has been detected. |\n| | [RECURSIVE\\_VIEW](https://docs.databricks.com/error-messages/error-classes.html#recursive_view) |\n| `42K0I` | SQL Config not found. |\n| | [SQL\\_CONF\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#sql_conf_not_found) |\n| `42K0J` | Property not found. |\n| | [UNSET\\_NONEXISTENT\\_PROPERTIES](https://docs.databricks.com/error-messages/error-classes.html#unset_nonexistent_properties) |\n| `42K0K` | Invalid inverse distribution function |\n| | [INVALID\\_INVERSE\\_DISTRIBUTION\\_FUNCTION](https://docs.databricks.com/error-messages/invalid-inverse-distribution-function-error-class.html) |\n| `42KD0` | Ambiguous name reference. |\n| | [AMBIGUOUS\\_ALIAS\\_IN\\_NESTED\\_CTE](https://docs.databricks.com/error-messages/error-classes.html#ambiguous_alias_in_nested_cte) |\n| `42KD1` | Operation not supported in READ ONLY session mode. |\n| | [OP\\_NOT\\_SUPPORTED\\_READ\\_ONLY](https://docs.databricks.com/error-messages/error-classes.html#op_not_supported_read_only) |\n| `42KD2` | The source and target table names of a SYNC operaton must be the same. |\n| | [SYNC\\_SRC\\_TARGET\\_TBL\\_NOT\\_SAME](https://docs.databricks.com/error-messages/error-classes.html#sync_src_target_tbl_not_same) |\n| `42KD3` | A column can not be added as specified. |\n| | [DELTA\\_ADD\\_COLUMN\\_AT\\_INDEX\\_LESS\\_THAN\\_ZERO](https://docs.databricks.com/error-messages/error-classes.html#delta_add_column_at_index_less_than_zero), [DELTA\\_ADD\\_COLUMN\\_PARENT\\_NOT\\_STRUCT](https://docs.databricks.com/error-messages/error-classes.html#delta_add_column_parent_not_struct), [DELTA\\_ADD\\_COLUMN\\_STRUCT\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_add_column_struct_not_found) |\n| `42KD4` | Operation not supported because table schema has changed. |\n| | [DELTA\\_BLOCK\\_COLUMN\\_MAPPING\\_AND\\_CDC\\_OPERATION](https://docs.databricks.com/error-messages/error-classes.html#delta_block_column_mapping_and_cdc_operation), [DELTA\\_STREAMING\\_INCOMPATIBLE\\_SCHEMA\\_CHANGE](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_incompatible_schema_change), [DELTA\\_STREAMING\\_INCOMPATIBLE\\_SCHEMA\\_CHANGE\\_USE\\_SCHEMA\\_LOG](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_incompatible_schema_change_use_schema_log) |\n| `42KD5` | Cannot create file or path. |\n| | [DELTA\\_CANNOT\\_CREATE\\_LOG\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_create_log_path), [PARTITION\\_LOCATION\\_IS\\_NOT\\_UNDER\\_TABLE\\_DIRECTORY](https://docs.databricks.com/error-messages/error-classes.html#partition_location_is_not_under_table_directory) |\n| `42KD6` | No partition information found. |\n| | [DELTA\\_CONVERSION\\_NO\\_PARTITION\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_conversion_no_partition_found), [DELTA\\_MISSING\\_PARTITION\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_partition_column), [DELTA\\_MISSING\\_PART\\_FILES](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_part_files) |\n| `42KD7` | Table signature mismatch. |\n| | [DELTA\\_CREATE\\_TABLE\\_SCHEME\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_create_table_scheme_mismatch), [DELTA\\_CREATE\\_TABLE\\_WITH\\_DIFFERENT\\_CLUSTERING](https://docs.databricks.com/error-messages/error-classes.html#delta_create_table_with_different_clustering), [DELTA\\_CREATE\\_TABLE\\_WITH\\_DIFFERENT\\_PARTITIONING](https://docs.databricks.com/error-messages/error-classes.html#delta_create_table_with_different_partitioning), [DELTA\\_CREATE\\_TABLE\\_WITH\\_DIFFERENT\\_PROPERTY](https://docs.databricks.com/error-messages/error-classes.html#delta_create_table_with_different_property), [DELTA\\_SET\\_LOCATION\\_SCHEMA\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_set_location_schema_mismatch) |\n| `42KD8` | Column position out of range. |\n| | [DELTA\\_DROP\\_COLUMN\\_AT\\_INDEX\\_LESS\\_THAN\\_ZERO](https://docs.databricks.com/error-messages/error-classes.html#delta_drop_column_at_index_less_than_zero), [DELTA\\_INDEX\\_LARGER\\_OR\\_EQUAL\\_THAN\\_STRUCT](https://docs.databricks.com/error-messages/error-classes.html#delta_index_larger_or_equal_than_struct), [DELTA\\_INDEX\\_LARGER\\_THAN\\_STRUCT](https://docs.databricks.com/error-messages/error-classes.html#delta_index_larger_than_struct) |\n| `42KD9` | Cannot infer table schema. |\n| | [CANNOT\\_MERGE\\_SCHEMAS](https://docs.databricks.com/error-messages/error-classes.html#cannot_merge_schemas), [COPY\\_INTO\\_SOURCE\\_SCHEMA\\_INFERENCE\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#copy_into_source_schema_inference_failed), [DELTA\\_FAILED\\_INFER\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#delta_failed_infer_schema), [UNABLE\\_TO\\_INFER\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#unable_to_infer_schema) |\n| `42KDA` | Failed to merge file into table schema. |\n| | [DELTA\\_FAILED\\_MERGE\\_SCHEMA\\_FILE](https://docs.databricks.com/error-messages/error-classes.html#delta_failed_merge_schema_file) |\n| `42KDB` | Invalid URL |\n| | [DATA\\_SOURCE\\_URL\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#data_source_url_not_allowed) |\n| `42KDC` | Archived file reference. |\n| | [DELTA\\_ARCHIVED\\_FILES\\_IN\\_LIMIT](https://docs.databricks.com/error-messages/error-classes.html#delta_archived_files_in_limit), [DELTA\\_ARCHIVED\\_FILES\\_IN\\_SCAN](https://docs.databricks.com/error-messages/error-classes.html#delta_archived_files_in_scan) |\n| `42KDD` | Unsupported operation in streaming view. |\n| | [UNEXPECTED\\_OPERATOR\\_IN\\_STREAMING\\_VIEW](https://docs.databricks.com/error-messages/error-classes.html#unexpected_operator_in_streaming_view) |\n| `42KDE` | Unsupported operation on streaming dataset. |\n| | [CALL\\_ON\\_STREAMING\\_DATASET\\_UNSUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#call_on_streaming_dataset_unsupported), [CANNOT\\_CREATE\\_DATA\\_SOURCE\\_TABLE](https://docs.databricks.com/error-messages/cannot-create-data-source-table-error-class.html), [INVALID\\_WRITER\\_COMMIT\\_MESSAGE](https://docs.databricks.com/error-messages/error-classes.html#invalid_writer_commit_message), [NON\\_TIME\\_WINDOW\\_NOT\\_SUPPORTED\\_IN\\_STREAMING](https://docs.databricks.com/error-messages/error-classes.html#non_time_window_not_supported_in_streaming) |\n| `42KDF` | A required routine parameter is missing an argument. |\n| | [MISSING\\_PARAMETER\\_FOR\\_KAFKA](https://docs.databricks.com/error-messages/error-classes.html#missing_parameter_for_kafka), [MISSING\\_PARAMETER\\_FOR\\_ROUTINE](https://docs.databricks.com/error-messages/error-classes.html#missing_parameter_for_routine), [XML\\_ROW\\_TAG\\_MISSING](https://docs.databricks.com/error-messages/error-classes.html#xml_row_tag_missing) |\n| `42KDG` | The target schema is not compatible with the ingested data. |\n| | [COPY\\_INTO\\_SCHEMA\\_MISMATCH\\_WITH\\_TARGET\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#copy_into_schema_mismatch_with_target_table) |\n| `42P01` | undefined table |\n| | [DELTA\\_CANNOT\\_REPLACE\\_MISSING\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_cannot_replace_missing_table), [DELTA\\_MISSING\\_DELTA\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_delta_table), [DELTA\\_MISSING\\_DELTA\\_TABLE\\_COPY\\_INTO](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_delta_table_copy_into), [DELTA\\_NO\\_RELATION\\_TABLE](https://docs.databricks.com/error-messages/error-classes.html#delta_no_relation_table), [DELTA\\_TABLE\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_table_not_found), [TABLE\\_OR\\_VIEW\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/table-or-view-not-found-error-class.html), [TABLE\\_WITH\\_ID\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#table_with_id_not_found), [VIEW\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#view_not_found) |\n| `42P02` | undefined parameter |\n| | [UNBOUND\\_SQL\\_PARAMETER](https://docs.databricks.com/error-messages/error-classes.html#unbound_sql_parameter) |\n| `42P06` | duplicate schema |\n| | [SCHEMA\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#schema_already_exists) |\n| `42P07` | duplicate table |\n| | [DELTA\\_TABLE\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#delta_table_already_exists), [TABLE\\_OR\\_VIEW\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#table_or_view_already_exists), [TEMP\\_TABLE\\_OR\\_VIEW\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#temp_table_or_view_already_exists), [VIEW\\_ALREADY\\_EXISTS](https://docs.databricks.com/error-messages/error-classes.html#view_already_exists) |\n| `42P10` | invalid column reference |\n| | [DELTA\\_CLUSTERING\\_COLUMNS\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_clustering_columns_mismatch), [DELTA\\_NON\\_PARTITION\\_COLUMN\\_REFERENCE](https://docs.databricks.com/error-messages/error-classes.html#delta_non_partition_column_reference), [DELTA\\_NON\\_PARTITION\\_COLUMN\\_SPECIFIED](https://docs.databricks.com/error-messages/error-classes.html#delta_non_partition_column_specified), [DELTA\\_SHOW\\_PARTITION\\_IN\\_NON\\_PARTITIONED\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_show_partition_in_non_partitioned_column), [DELTA\\_ZORDERING\\_ON\\_PARTITION\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_zordering_on_partition_column) |\n| `42P18` | indeterminate datatype |\n| | [DELTA\\_NULL\\_SCHEMA\\_IN\\_STREAMING\\_WRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_null_schema_in_streaming_write) |\n| `42P20` | windowing error |\n| | [UNSUPPORTED\\_EXPR\\_FOR\\_WINDOW](https://docs.databricks.com/error-messages/error-classes.html#unsupported_expr_for_window) |\n| `42S22` | Column not found |\n| | [NO\\_SQL\\_TYPE\\_IN\\_PROTOBUF\\_SCHEMA](https://docs.databricks.com/error-messages/error-classes.html#no_sql_type_in_protobuf_schema) |\n\n", "chunk_id": "2bd55a59e932de626d754be2a38e1d89", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `44`: with check option violation\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `44000` | with check option violation |\n| | [DELTA\\_REPLACE\\_WHERE\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_replace_where_mismatch) |\n\n#### SQLSTATE error codes\n##### Class `46`: Java DDL 1\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `46103` | unresolved class name |\n| | [CANNOT\\_LOAD\\_FUNCTION\\_CLASS](https://docs.databricks.com/error-messages/error-classes.html#cannot_load_function_class) |\n| `46110` | unsupported feature |\n| | [CANNOT\\_MODIFY\\_CONFIG](https://docs.databricks.com/error-messages/error-classes.html#cannot_modify_config) |\n| `46121` | invalid column name |\n| | [INVALID\\_COLUMN\\_NAME\\_AS\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#invalid_column_name_as_path) |\n\n#### SQLSTATE error codes\n##### Class `51`: Invalid Application State\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `51024` | An object cannot be used, because it has been marked inoperative. |\n| | [INCOMPATIBLE\\_VIEW\\_SCHEMA\\_CHANGE](https://docs.databricks.com/error-messages/error-classes.html#incompatible_view_schema_change) |\n\n", "chunk_id": "9f3c5d619969f5ea795bf6cdb5f9001f", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `53`: insufficient resources\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `53200` | out of memory |\n| | [EXECUTOR\\_BROADCAST\\_JOIN\\_OOM](https://docs.databricks.com/error-messages/error-classes.html#executor_broadcast_join_oom), [UNABLE\\_TO\\_ACQUIRE\\_MEMORY](https://docs.databricks.com/error-messages/error-classes.html#unable_to_acquire_memory) |\n\n", "chunk_id": "c119ee55422bee0cf26c4f569c3938da", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `54`: program limit exceeded\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `54000` | program limit exceeded |\n| | [COLLECTION\\_SIZE\\_LIMIT\\_EXCEEDED](https://docs.databricks.com/error-messages/collection-size-limit-exceeded-error-class.html), [DELTA\\_CLUSTER\\_BY\\_INVALID\\_NUM\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#delta_cluster_by_invalid_num_columns), [GROUPING\\_SIZE\\_LIMIT\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#grouping_size_limit_exceeded), [RELATION\\_LARGER\\_THAN\\_8G](https://docs.databricks.com/error-messages/error-classes.html#relation_larger_than_8g) |\n| `54006` | The result string is too long. |\n| | [EXCEED\\_LIMIT\\_LENGTH](https://docs.databricks.com/error-messages/error-classes.html#exceed_limit_length), [KRYO\\_BUFFER\\_OVERFLOW](https://docs.databricks.com/error-messages/error-classes.html#kryo_buffer_overflow) |\n| `54023` | too many arguments |\n| | [TABLE\\_VALUED\\_FUNCTION\\_TOO\\_MANY\\_TABLE\\_ARGUMENTS](https://docs.databricks.com/error-messages/error-classes.html#table_valued_function_too_many_table_arguments) |\n| `54K00` | Maximum depth of nested views was exceeded. |\n| | [VIEW\\_EXCEED\\_MAX\\_NESTED\\_DEPTH](https://docs.databricks.com/error-messages/error-classes.html#view_exceed_max_nested_depth) |\n| `54KD0` | Maximum UDF count in query plan exceeded. |\n| | [UDF\\_LIMITS](https://docs.databricks.com/error-messages/udf-limits-error-class.html), [UDF\\_MAX\\_COUNT\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#udf_max_count_exceeded) |\n| `54KD1` | Maximum object count in session exceeded. |\n| | [MAX\\_NUMBER\\_VARIABLES\\_IN\\_SESSION\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#max_number_variables_in_session_exceeded) |\n\n", "chunk_id": "319a6c2d1247838280a894637fc3cc51", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `55`: object not in prerequisite state\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `55019` | The object is in an invalid state for the operation. |\n| | [DIFFERENT\\_DELTA\\_TABLE\\_READ\\_BY\\_STREAMING\\_SOURCE](https://docs.databricks.com/error-messages/error-classes.html#different_delta_table_read_by_streaming_source), [EVENT\\_LOG\\_UNAVAILABLE](https://docs.databricks.com/error-messages/error-classes.html#event_log_unavailable), [MATERIALIZED\\_VIEW\\_MESA\\_REFRESH\\_WITHOUT\\_PIPELINE\\_ID](https://docs.databricks.com/error-messages/error-classes.html#materialized_view_mesa_refresh_without_pipeline_id), [STREAMING\\_TABLE\\_NEEDS\\_REFRESH](https://docs.databricks.com/error-messages/error-classes.html#streaming_table_needs_refresh) |\n\n", "chunk_id": "bb62951ccf53e9e96050e3388f44a89d", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `56`: Miscellaneous SQL or Product Error\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `56000` | Miscellaneous SQL or Product Error |\n| | [CHECKPOINT\\_RDD\\_BLOCK\\_ID\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#checkpoint_rdd_block_id_not_found) |\n| `56038` | The requested feature is not supported in this environment. |\n| | [AI\\_FUNCTION\\_UNSUPPORTED\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#ai_function_unsupported_error), [ANSI\\_CONFIG\\_CANNOT\\_BE\\_DISABLED](https://docs.databricks.com/error-messages/error-classes.html#ansi_config_cannot_be_disabled), [CF\\_MANAGED\\_FILE\\_EVENTS\\_ONLY\\_ON\\_SERVERLESS](https://docs.databricks.com/error-messages/error-classes.html#cf_managed_file_events_only_on_serverless), [CODEC\\_NOT\\_AVAILABLE](https://docs.databricks.com/error-messages/error-classes.html#codec_not_available), [COLUMN\\_MASKS\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#column_masks_not_enabled), [DATABRICKS\\_DELTA\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#databricks_delta_not_enabled), [DELTA\\_MISSING\\_ICEBERG\\_CLASS](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_iceberg_class), [DELTA\\_UNRECOGNIZED\\_INVARIANT](https://docs.databricks.com/error-messages/error-classes.html#delta_unrecognized_invariant), [DELTA\\_UNSUPPORTED\\_FEATURES\\_FOR\\_READ](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_features_for_read), [DELTA\\_UNSUPPORTED\\_FEATURES\\_FOR\\_WRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_features_for_write), [DELTA\\_UNSUPPORTED\\_FEATURES\\_IN\\_CONFIG](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_features_in_config), [DLT\\_EXPECTATIONS\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#dlt_expectations_not_supported), [DLT\\_VIEW\\_CLUSTER\\_BY\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#dlt_view_cluster_by_not_supported), [DLT\\_VIEW\\_LOCATION\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#dlt_view_location_not_supported), [DLT\\_VIEW\\_SCHEMA\\_WITH\\_TYPE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#dlt_view_schema_with_type_not_supported), [DLT\\_VIEW\\_TABLE\\_CONSTRAINTS\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#dlt_view_table_constraints_not_supported), [FEATURE\\_NOT\\_ON\\_CLASSIC\\_WAREHOUSE](https://docs.databricks.com/error-messages/error-classes.html#feature_not_on_classic_warehouse), [FEATURE\\_UNAVAILABLE](https://docs.databricks.com/error-messages/error-classes.html#feature_unavailable), [GET\\_TABLES\\_BY\\_TYPE\\_UNSUPPORTED\\_BY\\_HIVE\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#get_tables_by_type_unsupported_by_hive_version), [INCOMPATIBLE\\_DATASOURCE\\_REGISTER](https://docs.databricks.com/error-messages/error-classes.html#incompatible_datasource_register), [MATERIALIZED\\_VIEW\\_OPERATION\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/materialized-view-operation-not-allowed-error-class.html), [NATIVE\\_XML\\_DATA\\_SOURCE\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#native_xml_data_source_not_enabled), [STREAMING\\_TABLE\\_NOT\\_SUPPORTED](https://docs.databricks.com/error-messages/error-classes.html#streaming_table_not_supported), [UC\\_LAKEHOUSE\\_FEDERATION\\_WRITES\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#uc_lakehouse_federation_writes_not_allowed), [UC\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#uc_not_enabled), [UC\\_QUERY\\_FEDERATION\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#uc_query_federation_not_enabled), [UC\\_VOLUMES\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#uc_volumes_not_enabled), [UC\\_VOLUMES\\_SHARING\\_NOT\\_ENABLED](https://docs.databricks.com/error-messages/error-classes.html#uc_volumes_sharing_not_enabled), [UNSUPPORTED\\_TIMESERIES\\_COLUMNS](https://docs.databricks.com/error-messages/error-classes.html#unsupported_timeseries_columns) |\n| `56098` | An error occurred during implicit rebind, recompile, or revalidation. |\n| | [UC\\_INVALID\\_DEPENDENCIES](https://docs.databricks.com/error-messages/error-classes.html#uc_invalid_dependencies) |\n| `56K00` | Spark Connect error |\n| | [CONNECT](https://docs.databricks.com/error-messages/connect-error-class.html) |\n\n", "chunk_id": "780e36e664595661e3c44e77dfb7eb9a", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `57`: operator intervention\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `57012` | A non-database resource is not available. This will not affect the successful execution of subsequent statements. |\n| | [REMOTE\\_FUNCTION\\_HTTP\\_FAILED\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#remote_function_http_failed_error), [REMOTE\\_FUNCTION\\_HTTP\\_RESULT\\_UNEXPECTED\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#remote_function_http_result_unexpected_error), [REMOTE\\_FUNCTION\\_HTTP\\_RETRY\\_TIMEOUT](https://docs.databricks.com/error-messages/error-classes.html#remote_function_http_retry_timeout), [REMOTE\\_FUNCTION\\_MISSING\\_REQUIREMENTS\\_ERROR](https://docs.databricks.com/error-messages/error-classes.html#remote_function_missing_requirements_error) |\n\n", "chunk_id": "8602301708fe962ff0d4fba1507e78c3", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `58`: System error\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `58030` | I/O error |\n| | [CANNOT\\_LOAD\\_STATE\\_STORE](https://docs.databricks.com/error-messages/cannot-load-state-store-error-class.html), [CANNOT\\_RESTORE\\_PERMISSIONS\\_FOR\\_PATH](https://docs.databricks.com/error-messages/error-classes.html#cannot_restore_permissions_for_path), [CANNOT\\_WRITE\\_STATE\\_STORE](https://docs.databricks.com/error-messages/cannot-write-state-store-error-class.html), [FAILED\\_RENAME\\_TEMP\\_FILE](https://docs.databricks.com/error-messages/error-classes.html#failed_rename_temp_file), [INVALID\\_BUCKET\\_FILE](https://docs.databricks.com/error-messages/error-classes.html#invalid_bucket_file), [TASK\\_WRITE\\_FAILED](https://docs.databricks.com/error-messages/error-classes.html#task_write_failed), [UNABLE\\_TO\\_FETCH\\_HIVE\\_TABLES](https://docs.databricks.com/error-messages/error-classes.html#unable_to_fetch_hive_tables) |\n\n#### SQLSTATE error codes\n##### Class `82`: out of memory\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `82100` | out of memory (could not allocate) |\n| | [DELTA\\_BLOOM\\_FILTER\\_OOM\\_ON\\_WRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_bloom_filter_oom_on_write) |\n\n", "chunk_id": "a36201d90a304d52ead2a81f98375ea4", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `F0`: configuration file error\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `F0000` | config file error |\n| | [DELTA\\_INCONSISTENT\\_LOGSTORE\\_CONFS](https://docs.databricks.com/error-messages/error-classes.html#delta_inconsistent_logstore_confs), [DELTA\\_INVALID\\_LOGSTORE\\_CONF](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_logstore_conf), [DELTA\\_UNKNOWN\\_CONFIGURATION](https://docs.databricks.com/error-messages/error-classes.html#delta_unknown_configuration), [INVALID\\_DRIVER\\_MEMORY](https://docs.databricks.com/error-messages/error-classes.html#invalid_driver_memory), [INVALID\\_EXECUTOR\\_MEMORY](https://docs.databricks.com/error-messages/error-classes.html#invalid_executor_memory), [INVALID\\_KRYO\\_SERIALIZER\\_BUFFER\\_SIZE](https://docs.databricks.com/error-messages/error-classes.html#invalid_kryo_serializer_buffer_size) |\n\n#### SQLSTATE error codes\n##### Class `HV`: FDW-specific condition\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `HV000` | FDW-specific condition |\n| | [FAILED\\_JDBC](https://docs.databricks.com/error-messages/failed-jdbc-error-class.html) |\n\n", "chunk_id": "7c96b7afa7af8ce54db17220b5031dbf", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `HY`: CLI-specific condition\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `HY000` | CLI-specific condition |\n| | [INVALID\\_HANDLE](https://docs.databricks.com/error-messages/invalid-handle-error-class.html) |\n| `HY008` | operation canceled |\n| | [OPERATION\\_CANCELED](https://docs.databricks.com/error-messages/error-classes.html#operation_canceled) |\n| `HY109` | invalid cursor position |\n| | [INVALID\\_CURSOR](https://docs.databricks.com/error-messages/invalid-cursor-error-class.html) |\n\n", "chunk_id": "6a260894936979b3a2fd9ef431d49057", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `KD`: datasource specific errors\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `KD000` | datasource specific errors |\n| | [DC\\_CONNECTION\\_ERROR](https://docs.databricks.com/error-messages/dc-connection-error-error-class.html), [DC\\_CONNECTOR\\_ERROR](https://docs.databricks.com/error-messages/dc-connector-error-error-class.html), [DC\\_FILE\\_BACKUP\\_NOT\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#dc_file_backup_not_found), [DC\\_GA4\\_RAW\\_DATA\\_ERROR](https://docs.databricks.com/error-messages/dc-ga4-raw-data-error-error-class.html), [DC\\_INVALID\\_OFFSET](https://docs.databricks.com/error-messages/error-classes.html#dc_invalid_offset), [DC\\_SFDC\\_API\\_DAILY\\_QUOTA\\_THRESHOLD\\_EXCEEDED](https://docs.databricks.com/error-messages/error-classes.html#dc_sfdc_api_daily_quota_threshold_exceeded), [DC\\_SFDC\\_API\\_ERROR](https://docs.databricks.com/error-messages/dc-sfdc-api-error-error-class.html), [DC\\_SQLSERVER\\_ERROR](https://docs.databricks.com/error-messages/dc-sqlserver-error-error-class.html), [DC\\_WORKDAY\\_RAAS\\_API\\_ERROR](https://docs.databricks.com/error-messages/dc-workday-raas-api-error-error-class.html), [END\\_OFFSET\\_HAS\\_GREATER\\_OFFSET\\_FOR\\_TOPIC\\_PARTITION\\_THAN\\_LATEST\\_WITH\\_TRIGGER\\_AVAILABLENOW](https://docs.databricks.com/error-messages/error-classes.html#end_offset_has_greater_offset_for_topic_partition_than_latest_with_trigger_availablenow), [END\\_OFFSET\\_HAS\\_GREATER\\_OFFSET\\_FOR\\_TOPIC\\_PARTITION\\_THAN\\_PREFETCHED](https://docs.databricks.com/error-messages/error-classes.html#end_offset_has_greater_offset_for_topic_partition_than_prefetched), [FAILED\\_REGISTER\\_CLASS\\_WITH\\_KRYO](https://docs.databricks.com/error-messages/error-classes.html#failed_register_class_with_kryo), [GRAPHITE\\_SINK\\_INVALID\\_PROTOCOL](https://docs.databricks.com/error-messages/error-classes.html#graphite_sink_invalid_protocol), [GRAPHITE\\_SINK\\_PROPERTY\\_MISSING](https://docs.databricks.com/error-messages/error-classes.html#graphite_sink_property_missing), [INCOMPATIBLE\\_DATA\\_FOR\\_TABLE](https://docs.databricks.com/error-messages/incompatible-data-for-table-error-class.html), [LOST\\_TOPIC\\_PARTITIONS\\_IN\\_END\\_OFFSET\\_WITH\\_TRIGGER\\_AVAILABLENOW](https://docs.databricks.com/error-messages/error-classes.html#lost_topic_partitions_in_end_offset_with_trigger_availablenow), [MALFORMED\\_AVRO\\_MESSAGE](https://docs.databricks.com/error-messages/error-classes.html#malformed_avro_message), [MALFORMED\\_CSV\\_RECORD](https://docs.databricks.com/error-messages/error-classes.html#malformed_csv_record), [MISMATCHED\\_TOPIC\\_PARTITIONS\\_BETWEEN\\_END\\_OFFSET\\_AND\\_PREFETCHED](https://docs.databricks.com/error-messages/error-classes.html#mismatched_topic_partitions_between_end_offset_and_prefetched) |\n| `KD001` | Cannot read file footer |\n| | [CANNOT\\_READ\\_FILE\\_FOOTER](https://docs.databricks.com/error-messages/error-classes.html#cannot_read_file_footer), [DELTA\\_FAILED\\_READ\\_FILE\\_FOOTER](https://docs.databricks.com/error-messages/error-classes.html#delta_failed_read_file_footer) |\n| `KD002` | Unexpected version |\n| | [DELTA\\_FAILED\\_SCAN\\_WITH\\_HISTORICAL\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#delta_failed_scan_with_historical_version), [DELTA\\_MISSING\\_CHANGE\\_DATA](https://docs.databricks.com/error-messages/error-classes.html#delta_missing_change_data), [DELTA\\_STREAMING\\_CANNOT\\_CONTINUE\\_PROCESSING\\_POST\\_SCHEMA\\_EVOLUTION](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_cannot_continue_processing_post_schema_evolution), [DELTA\\_STREAMING\\_CHECK\\_COLUMN\\_MAPPING\\_NO\\_SNAPSHOT](https://docs.databricks.com/error-messages/error-classes.html#delta_streaming_check_column_mapping_no_snapshot) |\n| `KD003` | Incorrect access to data type |\n| | [CANNOT\\_READ\\_ARCHIVED\\_FILE](https://docs.databricks.com/error-messages/error-classes.html#cannot_read_archived_file), [CANNOT\\_READ\\_FILE](https://docs.databricks.com/error-messages/cannot-read-file-error-class.html), [DELTA\\_FOUND\\_MAP\\_TYPE\\_COLUMN](https://docs.databricks.com/error-messages/error-classes.html#delta_found_map_type_column), [DELTA\\_INCORRECT\\_ARRAY\\_ACCESS](https://docs.databricks.com/error-messages/error-classes.html#delta_incorrect_array_access), [DELTA\\_INCORRECT\\_ARRAY\\_ACCESS\\_BY\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#delta_incorrect_array_access_by_name), [UNKNOWN\\_FIELD\\_EXCEPTION](https://docs.databricks.com/error-messages/unknown-field-exception-error-class.html) |\n| `KD004` | Delta protocol version error |\n| | [DELTA\\_INVALID\\_PROTOCOL\\_DOWNGRADE](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_protocol_downgrade), [DELTA\\_INVALID\\_PROTOCOL\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#delta_invalid_protocol_version), [DELTA\\_READ\\_FEATURE\\_PROTOCOL\\_REQUIRES\\_WRITE](https://docs.databricks.com/error-messages/error-classes.html#delta_read_feature_protocol_requires_write), [DELTA\\_UNSUPPORTED\\_COLUMN\\_MAPPING\\_PROTOCOL](https://docs.databricks.com/error-messages/error-classes.html#delta_unsupported_column_mapping_protocol) |\n| `KD005` | Table must include at least one non partition column |\n| | [ALL\\_PARTITION\\_COLUMNS\\_NOT\\_ALLOWED](https://docs.databricks.com/error-messages/error-classes.html#all_partition_columns_not_allowed), [DELTA\\_NON\\_PARTITION\\_COLUMN\\_ABSENT](https://docs.databricks.com/error-messages/error-classes.html#delta_non_partition_column_absent) |\n| `KD006` | No commits found at log path |\n| | [DELTA\\_NO\\_COMMITS\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_no_commits_found), [DELTA\\_NO\\_RECREATABLE\\_HISTORY\\_FOUND](https://docs.databricks.com/error-messages/error-classes.html#delta_no_recreatable_history_found), [STDS\\_COMMITTED\\_BATCH\\_UNAVAILABLE](https://docs.databricks.com/error-messages/error-classes.html#stds_committed_batch_unavailable), [STDS\\_NO\\_PARTITION\\_DISCOVERED\\_IN\\_STATE\\_STORE](https://docs.databricks.com/error-messages/error-classes.html#stds_no_partition_discovered_in_state_store), [STDS\\_OFFSET\\_LOG\\_UNAVAILABLE](https://docs.databricks.com/error-messages/error-classes.html#stds_offset_log_unavailable), [STDS\\_OFFSET\\_METADATA\\_LOG\\_UNAVAILABLE](https://docs.databricks.com/error-messages/error-classes.html#stds_offset_metadata_log_unavailable) |\n| `KD007` | Table signature changed |\n| | [DELTA\\_SCHEMA\\_CHANGED](https://docs.databricks.com/error-messages/error-classes.html#delta_schema_changed), [DELTA\\_SCHEMA\\_CHANGED\\_WITH\\_STARTING\\_OPTIONS](https://docs.databricks.com/error-messages/error-classes.html#delta_schema_changed_with_starting_options), [DELTA\\_SCHEMA\\_CHANGED\\_WITH\\_VERSION](https://docs.databricks.com/error-messages/error-classes.html#delta_schema_changed_with_version), [DELTA\\_SCHEMA\\_CHANGE\\_SINCE\\_ANALYSIS](https://docs.databricks.com/error-messages/error-classes.html#delta_schema_change_since_analysis), [DELTA\\_TABLE\\_ID\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#delta_table_id_mismatch) |\n| `KD008` | Table signature not set |\n| | [DELTA\\_SCHEMA\\_NOT\\_SET](https://docs.databricks.com/error-messages/error-classes.html#delta_schema_not_set) |\n| `KD009` | Partitions do not match |\n| | [DELTA\\_UNEXPECTED\\_NUM\\_PARTITION\\_COLUMNS\\_FROM\\_FILE\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#delta_unexpected_num_partition_columns_from_file_name), [DELTA\\_UNEXPECTED\\_PARTITION\\_COLUMN\\_FROM\\_FILE\\_NAME](https://docs.databricks.com/error-messages/error-classes.html#delta_unexpected_partition_column_from_file_name), [DELTA\\_UNEXPECTED\\_PARTITION\\_SCHEMA\\_FROM\\_USER](https://docs.databricks.com/error-messages/error-classes.html#delta_unexpected_partition_schema_from_user) |\n| `KD00A` | Unexpected partial scan |\n| | [DELTA\\_UNEXPECTED\\_PARTIAL\\_SCAN](https://docs.databricks.com/error-messages/error-classes.html#delta_unexpected_partial_scan) |\n| `KD00B` | Unrecognised file |\n| | [DELTA\\_UNRECOGNIZED\\_LOGFILE](https://docs.databricks.com/error-messages/error-classes.html#delta_unrecognized_logfile), [ERROR\\_READING\\_AVRO\\_UNKNOWN\\_FINGERPRINT](https://docs.databricks.com/error-messages/error-classes.html#error_reading_avro_unknown_fingerprint) |\n| `KD00C` | Versioning not contiguous |\n| | [DELTA\\_VERSIONS\\_NOT\\_CONTIGUOUS](https://docs.databricks.com/error-messages/delta-versions-not-contiguous-error-class.html) |\n| `KD00D` | Stats required |\n| | [DELTA\\_ZORDERING\\_ON\\_COLUMN\\_WITHOUT\\_STATS](https://docs.databricks.com/error-messages/error-classes.html#delta_zordering_on_column_without_stats) |\n| `KD00E` | table feature validation failure |\n| | [DELTA\\_ICEBERG\\_COMPAT\\_VIOLATION](https://docs.databricks.com/error-messages/delta-iceberg-compat-violation-error-class.html), [DELTA\\_UNIVERSAL\\_FORMAT\\_VIOLATION](https://docs.databricks.com/error-messages/error-classes.html#delta_universal_format_violation) |\n\n", "chunk_id": "4198b81de6a0ef24dc0148e7f2132591", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Databricks reference documentation\n## Error handling in Databricks\n#### SQLSTATE error codes\n##### Class `P0`: procedural logic error\n\n| SQLSTATE | Description and issuing error classes |\n| --- | --- |\n| `P0001` | raise exception |\n| | [USER\\_RAISED\\_EXCEPTION](https://docs.databricks.com/error-messages/error-classes.html#user_raised_exception), [USER\\_RAISED\\_EXCEPTION\\_PARAMETER\\_MISMATCH](https://docs.databricks.com/error-messages/error-classes.html#user_raised_exception_parameter_mismatch), [USER\\_RAISED\\_EXCEPTION\\_UNKNOWN\\_ERROR\\_CLASS](https://docs.databricks.com/error-messages/error-classes.html#user_raised_exception_unknown_error_class) |\n\n", "chunk_id": "fec94328cfb459e2924f295cfbe9086e", "url": "https://docs.databricks.com/error-messages/sqlstates.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n", "chunk_id": "588689b77102abf1c5a83cc9756ee1d6", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n\nThis article is a reference for Databricks Utilities (`dbutils`). `dbutils` utilities are available in Python, R, and Scala notebooks. You can use the utilities to: \n* Work with files and object storage efficiently.\n* Work with secrets. \n**How to**: [List utilities](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-utilities), [list commands](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-commands), [display command help](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-help) \n**Utilities**: [credentials](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-credentials), [data](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-data), [fs](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs), [jobs](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-jobs), [library](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-library), [notebook](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-workflow), [secrets](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-secrets), [widgets](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-widgets), [Utilities API library](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-api)\n\n", "chunk_id": "82c850670f128c13f3e4cc27ae01b5dc", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### List available utilities\n\nTo list available utilities along with a short description for each utility, run `dbutils.help()` for Python or Scala. \nThis example lists available commands for the Databricks Utilities. \n```\ndbutils.help()\n\n``` \n```\ndbutils.help()\n\n``` \n```\nThis module provides various utilities for users to interact with the rest of Databricks.\n\ncredentials: DatabricksCredentialUtils-> Utilities for interacting with credentials within notebooks\ndata: DataUtils-> Utilities for understanding and interacting with datasets (EXPERIMENTAL)\nfs: DbfsUtils-> Manipulates the Databricks filesystem (DBFS) from the console\njobs: JobsUtils-> Utilities for leveraging jobs features\nlibrary: LibraryUtils-> Utilities for session isolated libraries\nmeta: MetaUtils-> Methods to hook into the compiler (EXPERIMENTAL)\nnotebook: NotebookUtils-> Utilities for the control flow of a notebook (EXPERIMENTAL)\npreview: Preview-> Utilities under preview category\nsecrets: SecretUtils-> Provides utilities for leveraging secrets within notebooks\nwidgets: WidgetsUtils-> Methods to create and get bound value of input widgets inside notebooks\n\n```\n\n", "chunk_id": "41a606131f8650e106a6a8aae47a2a9b", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### List available commands for a utility\n\nTo list available commands for a utility along with a short description of each command, run `.help()` after the programmatic name for the utility. \nThis example lists available commands for the Databricks File System (DBFS) utility. \n```\ndbutils.fs.help()\n\n``` \n```\ndbutils.fs.help()\n\n``` \n```\ndbutils.fs.help()\n\n``` \n```\ndbutils.fs provides utilities for working with FileSystems. Most methods in this package can take either a DBFS path (e.g., \"/foo\" or \"dbfs:/foo\"), or another FileSystem URI. For more info about a method, use dbutils.fs.help(\"methodName\"). In notebooks, you can also use the %fs shorthand to access DBFS. The %fs shorthand maps straightforwardly onto dbutils calls. For example, \"%fs head --maxBytes=10000 /file/path\" translates into \"dbutils.fs.head(\"/file/path\", maxBytes = 10000)\".\n\nfsutils\n\ncp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems\nhead(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8\nls(dir: String): Seq -> Lists the contents of a directory\nmkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories\nmv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems\nput(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8\nrm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory\n\nmount\n\nmount(source: String, mountPoint: String, encryptionType: String = \"\", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point\nmounts: Seq -> Displays information about what is mounted within DBFS\nrefreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information\nunmount(mountPoint: String): boolean -> Deletes a DBFS mount point\nupdateMount(source: String, mountPoint: String, encryptionType: String = \"\", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Similar to mount(), but updates an existing mount point instead of creating a new one\n\n```\n\n", "chunk_id": "2b39787aa76e4c2785cb2ee59ba10e21", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Display help for a command\n\nTo display help for a command, run `.help(\"\")` after the command name. \nThis example displays help for the DBFS copy command. \n```\ndbutils.fs.help(\"cp\")\n\n``` \n```\ndbutils.fs.help(\"cp\")\n\n``` \n```\ndbutils.fs.help(\"cp\")\n\n``` \n```\n/**\n* Copies a file or directory, possibly across FileSystems.\n*\n* Example: cp(\"/mnt/my-folder/a\", \"dbfs:/a/b\")\n*\n* @param from FileSystem URI of the source file or directory\n* @param to FileSystem URI of the destination file or directory\n* @param recurse if true, all files and directories will be recursively copied\n* @return true if all files were successfully copied\n*/\ncp(from: java.lang.String, to: java.lang.String, recurse: boolean = false): boolean\n\n```\n\n", "chunk_id": "77ada2043150e605c1717be705d689ac", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Credentials utility (dbutils.credentials)\n\n**Commands**: [assumeRole](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-credentials-assumerole), [showCurrentRole](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-credentials-showcurrentrole), [showRoles](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-credentials-showroles) \nThe credentials utility allows you to interact with credentials within notebooks. This utility is usable only on clusters with [credential passthrough](https://docs.databricks.com/archive/credential-passthrough/iam-passthrough.html) enabled. To list the available commands, run `dbutils.credentials.help()`. \n```\nassumeRole(role: String): boolean -> Sets the role ARN to assume when looking for credentials to authenticate with S3\nshowCurrentRole: List -> Shows the currently set role\nshowRoles: List -> Shows the set of possible assumed roles\n\n``` \n### assumeRole command (dbutils.credentials.assumeRole) \nSets the Amazon Resource Name (ARN) for the AWS Identity and Access Management (IAM) role to assume when looking for credentials to authenticate with Amazon S3. After you run this command, you can run S3 access commands, such as `sc.textFile(\"s3a://my-bucket/my-file.csv\")` to access an object. \nTo display help for this command, run `dbutils.credentials.help(\"assumeRole\")`. \n```\ndbutils.credentials.assumeRole(\"arn:aws:iam::123456789012:roles/my-role\")\n\n# Out[1]: True\n\n``` \n```\ndbutils.credentials.assumeRole(\"arn:aws:iam::123456789012:roles/my-role\")\n\n# TRUE\n\n``` \n```\ndbutils.credentials.assumeRole(\"arn:aws:iam::123456789012:roles/my-role\")\n\n// res0: Boolean = true\n\n``` \n### showCurrentRole command (dbutils.credentials.showCurrentRole) \nLists the currently set AWS Identity and Access Management (IAM) role. \nTo display help for this command, run `dbutils.credentials.help(\"showCurrentRole\")`. \n```\ndbutils.credentials.showCurrentRole()\n\n# Out[1]: ['arn:aws:iam::123456789012:role/my-role-a']\n\n``` \n```\ndbutils.credentials.showCurrentRole()\n\n# [[1]]\n# [1] \"arn:aws:iam::123456789012:role/my-role-a\"\n\n``` \n```\ndbutils.credentials.showCurrentRole()\n\n// res0: java.util.List[String] = [arn:aws:iam::123456789012:role/my-role-a]\n\n``` \n### showRoles command (dbutils.credentials.showRoles) \nLists the set of possible assumed AWS Identity and Access Management (IAM) roles. \nTo display help for this command, run `dbutils.credentials.help(\"showRoles\")`. \n```\ndbutils.credentials.showRoles()\n\n# Out[1]: ['arn:aws:iam::123456789012:role/my-role-a', 'arn:aws:iam::123456789012:role/my-role-b']\n\n``` \n```\ndbutils.credentials.showRoles()\n\n# [[1]]\n# [1] \"arn:aws:iam::123456789012:role/my-role-a\"\n#\n# [[2]]\n# [1] \"arn:aws:iam::123456789012:role/my-role-b\"\n\n``` \n```\ndbutils.credentials.showRoles()\n\n// res0: java.util.List[String] = [arn:aws:iam::123456789012:role/my-role-a, arn:aws:iam::123456789012:role/my-role-b]\n\n```\n\n", "chunk_id": "e072e52268f229f0373115238c07f637", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### Data utility (dbutils.data)\n\nPreview \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \nNote \nAvailable in Databricks Runtime 9.0 and above. \n**Commands**: [summarize](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-data-summarize) \nThe data utility allows you to understand and interpret datasets. To list the available commands, run `dbutils.data.help()`. \n```\ndbutils.data provides utilities for understanding and interpreting datasets. This module is currently in preview and may be unstable. For more info about a method, use dbutils.data.help(\"methodName\").\n\nsummarize(df: Object, precise: boolean): void -> Summarize a Spark DataFrame and visualize the statistics to get quick insights\n\n``` \n### summarize command (dbutils.data.summarize) \nCalculates and displays summary statistics of an Apache Spark DataFrame or pandas DataFrame. This command is available for Python, Scala and R. \nCaution \nThis command analyzes the complete contents of the DataFrame. Running this command for very large DataFrames can be very expensive. \nTo display help for this command, run `dbutils.data.help(\"summarize\")`. \nIn Databricks Runtime 10.4 LTS and above, you can use the additional `precise` parameter to adjust the precision of the computed statistics. \nNote \nThis feature is in [Public Preview](https://docs.databricks.com/release-notes/release-types.html). \n* When `precise` is set to false (the default), some returned statistics include approximations to reduce run time. \n+ The number of distinct values for categorical columns may have ~5% relative error for high-cardinality columns.\n+ The frequent value counts may have an error of up to 0.01% when the number of distinct values is greater than 10000.\n+ The histograms and percentile estimates may have an error of up to 0.01% relative to the total number of rows.\n* When `precise` is set to true, the statistics are computed with higher precision. All statistics except for the histograms and percentiles for numeric columns are now exact. \n+ The histograms and percentile estimates may have an error of up to 0.0001% relative to the total number of rows. \nThe tooltip at the top of the data summary output indicates the mode of current run. \nThis example displays summary statistics for an Apache Spark DataFrame with approximations enabled by default. To see the\nresults, run this command in a notebook. This example is based on [Sample datasets](https://docs.databricks.com/discover/databricks-datasets.html). \n```\ndf = spark.read.format('csv').load(\n'/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv',\nheader=True,\ninferSchema=True\n)\ndbutils.data.summarize(df)\n\n``` \n```\ndf <- read.df(\"/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv\", source = \"csv\", header=\"true\", inferSchema = \"true\")\ndbutils.data.summarize(df)\n\n``` \n```\nval df = spark.read.format(\"csv\")\n.option(\"inferSchema\", \"true\")\n.option(\"header\", \"true\")\n.load(\"/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv\")\ndbutils.data.summarize(df)\n\n``` \nNote that the visualization uses [SI notation](https://en.wikipedia.org/wiki/International_System_of_Units#Prefixes) to concisely render numerical values smaller than 0.01 or larger than 10000. As an example, the numerical value `1.25e-15` will be rendered as `1.25f`. One exception: the visualization uses \u201c`B`\u201d for `1.0e9` ([giga](https://en.wikipedia.org/wiki/Giga-)) instead of \u201c`G`\u201d.\n\n", "chunk_id": "76e9e78ed7cf77d5208a5937a4f93623", "url": "https://docs.databricks.com/dev-tools/databricks-utils.html"} +{"chunked_text": "# Develop on Databricks\n## Developer tools and guidance\n#### Databricks Utilities (`dbutils`) reference\n##### File system utility (dbutils.fs)\n\nWarning \nThe Python implementation of all `dbutils.fs` methods uses `snake_case` rather than `camelCase` for keyword formatting. \nFor example: while `dbutils.fs.help()` displays the option `extraConfigs` for `dbutils.fs.mount()`, in Python you would use the keyword `extra_configs`. \n**Commands**: [cp](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-cp), [head](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-head), [ls](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-ls), [mkdirs](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-mkdirs), [mount](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-mount), [mounts](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-mounts), [mv](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-mv), [put](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-put), [refreshMounts](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-refreshmounts), [rm](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-rm), [unmount](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-unmount), [updateMount](https://docs.databricks.com/dev-tools/databricks-utils.html#dbutils-fs-updatemount) \nThe file system utility allows you to access [What is DBFS?](https://docs.databricks.com/dbfs/index.html), making it easier to use Databricks as a file system. To list the available commands, run `dbutils.fs.help()`. \n```\ndbutils.fs provides utilities for working with FileSystems. Most methods in this package can take either a DBFS path (e.g., \"/foo\" or \"dbfs:/foo\"), or another FileSystem URI. For more info about a method, use dbutils.fs.help(\"methodName\"). In notebooks, you can also use the %fs shorthand to access DBFS. The %fs shorthand maps straightforwardly onto dbutils calls. For example, \"%fs head --maxBytes=10000 /file/path\" translates into \"dbutils.fs.head(\"/file/path\", maxBytes = 10000)\".\n\nfsutils\n\ncp(from: String, to: String, recurse: boolean = false): boolean -> Copies a file or directory, possibly across FileSystems\nhead(file: String, maxBytes: int = 65536): String -> Returns up to the first 'maxBytes' bytes of the given file as a String encoded in UTF-8\nls(dir: String): Seq -> Lists the contents of a directory\nmkdirs(dir: String): boolean -> Creates the given directory if it does not exist, also creating any necessary parent directories\nmv(from: String, to: String, recurse: boolean = false): boolean -> Moves a file or directory, possibly across FileSystems\nput(file: String, contents: String, overwrite: boolean = false): boolean -> Writes the given String out to a file, encoded in UTF-8\nrm(dir: String, recurse: boolean = false): boolean -> Removes a file or directory\n\nmount\n\nmount(source: String, mountPoint: String, encryptionType: String = \"\", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Mounts the given source directory into DBFS at the given mount point\nmounts: Seq -> Displays information about what is mounted within DBFS\nrefreshMounts: boolean -> Forces all machines in this cluster to refresh their mount cache, ensuring they receive the most recent information\nunmount(mountPoint: String): boolean -> Deletes a DBFS mount point\nupdateMount(source: String, mountPoint: String, encryptionType: String = \"\", owner: String = null, extraConfigs: Map = Map.empty[String, String]): boolean -> Similar to mount(), but updates an existing mount point instead of creating a new one\n\n``` \n### cp command (dbutils.fs.cp) \nCopies a file or directory, possibly across filesystems. \nTo display help for this command, run `dbutils.fs.help(\"cp\")`. \nThis example copies the file named `data.csv` from `/Volumes/main/default/my-volume/` to `new-data.csv` in the same volume. \n```\ndbutils.fs.cp(\"/Volumes/main/default/my-volume/data.csv\", \"/Volumes/main/default/my-volume/new-data.csv\")\n\n# Out[4]: True\n\n``` \n```\ndbutils.fs.cp(\"/Volumes/main/default/my-volume/data.csv\", \"/Volumes/main/default/my-volume/new-data.csv\")\n\n# [1] TRUE\n\n``` \n```\ndbutils.fs.cp(\"/Volumes/main/default/my-volume/data.csv\", \"/Volumes/main/default/my-volume/new-data.csv\")\n\n// res3: Boolean = true\n\n``` \n### head command (dbutils.fs.head) \nReturns up to the specified maximum number bytes of the given file. The bytes are returned as a UTF-8 encoded string. \nTo display help for this command, run `dbutils.fs.help(\"head\")`. \nThis example displays the first 25 bytes of the file `data.csv` located in `/Volumes/main/default/my-volume/`. \n```\ndbutils.fs.head(\"/Volumes/main/default/my-volume/data.csv\", 25)\n\n# [Truncated to first 25 bytes]\n# Out[12]: 'Year,First Name,County,Se'\n\n``` \n```\ndbutils.fs.head(\"/Volumes/main/default/my-volume/data.csv\", 25)\n\n# [1] \"Year,First Name,County,Se\"\n\n``` \n```\ndbutils.fs.head(\"/Volumes/main/default/my-volume/data.csv\", 25)\n\n// [Truncated to first 25 bytes]\n// res4: String =\n// \"Year,First Name,County,Se\"\n\n``` \n### ls command (dbutils.fs.ls) \nLists the contents of a directory. \nTo display help for this command, run `dbutils.fs.help(\"ls\")`. \nThis example displays information about the contents of `/Volumes/main/default/my-volume/`. The `modificationTime` field is available in Databricks Runtime 10.4 LTS and above. In R, `modificationTime` is returned as a string. \n```\ndbutils.fs.ls(\"/Volumes/main/default/my-volume/\")\n\n# Out[13]: [FileInfo(path='dbfs:/Volumes/main/default/my-volume/data.csv', name='data.csv', size=2258987, modificationTime=1711357839000)]\n\n``` \n```\ndbutils.fs.ls(\"/Volumes/main/default/my-volume/\")\n\n# For prettier results from dbutils.fs.ls(