You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/integrations/engines/databricks.md
+4-6Lines changed: 4 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,9 +14,9 @@ SQLMesh connects to Databricks with the [Databricks SQL Connector](https://docs.
14
14
15
15
The SQL Connector is bundled with SQLMesh and automatically installed when you include the `databricks` extra in the command `pip install "sqlmesh[databricks]"`.
16
16
17
-
The SQL Connector has all the functionality needed for SQLMesh to execute SQL models on Databricks and Python models that do not return PySpark DataFrames.
17
+
The SQL Connector has all the functionality needed for SQLMesh to execute SQL models on Databricks and Python models locally (the default SQLMesh approach).
18
18
19
-
If you have Python models returning PySpark DataFrames, check out the [Databricks Connect](#databricks-connect-1) section.
19
+
The SQL Connector does not support Databricks Serverless Compute. If you require Serverless Compute then you must use the Databricks Connect library.
20
20
21
21
### Databricks Connect
22
22
@@ -229,9 +229,7 @@ If you want Databricks to process PySpark DataFrames in SQLMesh Python models, t
229
229
230
230
SQLMesh **DOES NOT** include/bundle the Databricks Connect library. You must [install the version of Databricks Connect](https://docs.databricks.com/en/dev-tools/databricks-connect/python/install.html) that matches the Databricks Runtime used in your Databricks cluster.
231
231
232
-
If SQLMesh detects that you have Databricks Connect installed, then it will automatically configure the connection and use it for all Python models that return a Pandas or PySpark DataFrame.
233
-
234
-
To have databricks-connect installed but ignored by SQLMesh, set `disable_databricks_connect` to `true` in the connection configuration.
232
+
SQLMesh's Databricks Connect implementation supports Databricks Runtime 13.0 or higher. If SQLMesh detects that you have Databricks Connect installed, then it will use it for all Python models (both Pandas and PySpark DataFrames).
235
233
236
234
Databricks Connect can execute SQL and DataFrame operations on different clusters by setting the SQLMesh `databricks_connect_*` connection options. For example, these options could configure SQLMesh to run SQL on a [Databricks SQL Warehouse](https://docs.databricks.com/sql/admin/create-sql-warehouse.html) while still routing DataFrame operations to a normal Databricks Cluster.
237
235
@@ -261,7 +259,7 @@ The only relevant SQLMesh configuration parameter is the optional `catalog` para
261
259
|`databricks_connect_server_hostname`| Databricks Connect Only: Databricks Connect server hostname. Uses `server_hostname` if not set. | string | N |
262
260
|`databricks_connect_access_token`| Databricks Connect Only: Databricks Connect access token. Uses `access_token` if not set. | string | N |
263
261
|`databricks_connect_cluster_id`| Databricks Connect Only: Databricks Connect cluster ID. Uses `http_path` if not set. Cannot be a Databricks SQL Warehouse. | string | N |
264
-
|`databricks_connect_use_serverless`| Databricks Connect Only: Use a serverless cluster for Databricks Connect instead of `databricks_connect_cluster_id`. | bool | N |
262
+
|`databricks_connect_use_serverless`| Databricks Connect Only: Use a serverless cluster for Databricks Connect. If using serverless then SQL connector is disabled since Serverless is not supported for SQL Connector| bool | N |
265
263
|`force_databricks_connect`| When running locally, force the use of Databricks Connect for all model operations (so don't use SQL Connector for SQL models) | bool | N |
266
264
|`disable_databricks_connect`| When running locally, disable the use of Databricks Connect for all model operations (so use SQL Connector for all models) | bool | N |
267
265
|`disable_spark_session`| Do not use SparkSession if it is available (like when running in a notebook). | bool | N |
f"Current catalog mismatch between Databricks SQL Connector and Databricks-Connect: `{sql_connector_catalog}` != `{pyspark_catalog}`. Set `catalog` connection property to make them the same."
"Could not determine default catalog. Define the connection property `catalog` since it can't be inferred from your connection. See SQLMesh Databricks documentation for details"
0 commit comments