databricks-industry-solutions
diff --git a/‎.gitignore‎
Lines changed: 2 additions & 0 deletions b/‎.gitignore‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎constraints.txt‎
Lines changed: 1 addition & 1 deletion b/‎constraints.txt‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/daily/foundation_daily.ipynb‎
Lines changed: 13 additions & 14 deletions b/‎examples/daily/foundation_daily.ipynb‎
Lines changed: 13 additions & 14 deletions
diff --git a/‎examples/daily/global_daily.ipynb‎
Lines changed: 14 additions & 15 deletions b/‎examples/daily/global_daily.ipynb‎
Lines changed: 14 additions & 15 deletions
diff --git a/‎examples/daily/local_univariate_daily.ipynb‎
Lines changed: 14 additions & 15 deletions b/‎examples/daily/local_univariate_daily.ipynb‎
Lines changed: 14 additions & 15 deletions
diff --git a/‎examples/external_regressors/local_univariate_external_regressors_daily.ipynb‎
Lines changed: 1 addition & 1 deletion b/‎examples/external_regressors/local_univariate_external_regressors_daily.ipynb‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎examples/hourly/foundation_hourly.ipynb‎
Lines changed: 13 additions & 14 deletions b/‎examples/hourly/foundation_hourly.ipynb‎
Lines changed: 13 additions & 14 deletions
diff --git a/‎examples/hourly/global_hourly.ipynb‎
Lines changed: 13 additions & 14 deletions b/‎examples/hourly/global_hourly.ipynb‎
Lines changed: 13 additions & 14 deletions
diff --git a/‎examples/hourly/local_univariate_hourly.ipynb‎
Lines changed: 14 additions & 15 deletions b/‎examples/hourly/local_univariate_hourly.ipynb‎
Lines changed: 14 additions & 15 deletions
diff --git a/‎examples/monthly/foundation_monthly.ipynb‎
Lines changed: 20 additions & 24 deletions b/‎examples/monthly/foundation_monthly.ipynb‎
Lines changed: 20 additions & 24 deletions
@@ -137,3 +137,5 @@ dmypy.json
 # Lightning Logs
 examples/lightning_logs
 examples/m5-examples/lightning_logs
+
+.databricks
@@ -1,6 +1,6 @@
 torch==2.3.1+cu121
 torchvision>=0.18.0
 numpy==1.26.4
-pandas==2.1.4
+pandas==2.2
 pyarrow==14.0.1
 pyarrow-hotfix==0.6
@@ -174,27 +174,26 @@
     "\n",
     "def create_m4_daily():\n",
     "    y_df, _, _ = M4.load(directory=str(pathlib.Path.home()), group=\"Daily\")\n",
-    "    _ids = [f\"D{i}\" for i in range(1, n+1)]\n",
+    "    target_ids = {f\"D{i}\" for i in range(1, n)}\n",
+    "    y_df = y_df[y_df[\"unique_id\"].isin(target_ids)]\n",
     "    y_df = (\n",
-    "        y_df.groupby(\"unique_id\")\n",
-    "        .filter(lambda x: x.unique_id.iloc[0] in _ids)\n",
-    "        .groupby(\"unique_id\")\n",
-    "        .apply(transform_group)\n",
-    "        .reset_index(drop=True)\n",
+    "        y_df.groupby(\"unique_id\", group_keys=False)\n",
+    "             .apply(lambda g: transform_group(g, g.name))\n",
+    "             .reset_index(drop=True)\n",
     "    )\n",
     "    return y_df\n",
     "\n",
     "\n",
-    "def transform_group(df):\n",
-    "    unique_id = df.unique_id.iloc[0]\n",
+    "def transform_group(df, unique_id):\n",
     "    if len(df) > 1020:\n",
     "        df = df.iloc[-1020:]\n",
-    "    _start = pd.Timestamp(\"2020-01-01\")\n",
-    "    _end = _start + pd.DateOffset(days=int(df.count()[0]) - 1)\n",
-    "    date_idx = pd.date_range(start=_start, end=_end, freq=\"D\", name=\"ds\")\n",
-    "    res_df = pd.DataFrame(data=[], index=date_idx).reset_index()\n",
-    "    res_df[\"unique_id\"] = unique_id\n",
-    "    res_df[\"y\"] = df.y.values\n",
+    "    start = pd.Timestamp(\"2020-01-01\")\n",
+    "    date_idx = pd.date_range(start=start, periods=len(df), freq=\"D\", name=\"ds\")\n",
+    "    res_df = pd.DataFrame({\n",
+    "        \"ds\": date_idx,\n",
+    "        \"unique_id\": unique_id,\n",
+    "        \"y\": df[\"y\"].to_numpy()\n",
+    "    })\n",
     "    return res_df"
    ]
   },
 
@@ -174,28 +174,27 @@
     "\n",
     "def create_m4_daily():\n",
     "    y_df, _, _ = M4.load(directory=str(pathlib.Path.home()), group=\"Daily\")\n",
-    "    _ids = [f\"D{i}\" for i in range(1, n+1)]\n",
+    "    target_ids = {f\"D{i}\" for i in range(1, n)}\n",
+    "    y_df = y_df[y_df[\"unique_id\"].isin(target_ids)]\n",
     "    y_df = (\n",
-    "        y_df.groupby(\"unique_id\")\n",
-    "        .filter(lambda x: x.unique_id.iloc[0] in _ids)\n",
-    "        .groupby(\"unique_id\")\n",
-    "        .apply(transform_group)\n",
-    "        .reset_index(drop=True)\n",
+    "        y_df.groupby(\"unique_id\", group_keys=False)\n",
+    "             .apply(lambda g: transform_group(g, g.name))\n",
+    "             .reset_index(drop=True)\n",
     "    )\n",
     "    return y_df\n",
     "\n",
     "\n",
-    "def transform_group(df):\n",
-    "    unique_id = df.unique_id.iloc[0]\n",
+    "def transform_group(df, unique_id):\n",
     "    if len(df) > 1020:\n",
     "        df = df.iloc[-1020:]\n",
-    "    _start = pd.Timestamp(\"2020-01-01\")\n",
-    "    _end = _start + pd.DateOffset(days=int(df.count()[0]) - 1)\n",
-    "    date_idx = pd.date_range(start=_start, end=_end, freq=\"D\", name=\"ds\")\n",
-    "    res_df = pd.DataFrame(data=[], index=date_idx).reset_index()\n",
-    "    res_df[\"unique_id\"] = unique_id\n",
-    "    res_df[\"y\"] = df.y.values\n",
-    "    return res_df\n"
+    "    start = pd.Timestamp(\"2020-01-01\")\n",
+    "    date_idx = pd.date_range(start=start, periods=len(df), freq=\"D\", name=\"ds\")\n",
+    "    res_df = pd.DataFrame({\n",
+    "        \"ds\": date_idx,\n",
+    "        \"unique_id\": unique_id,\n",
+    "        \"y\": df[\"y\"].to_numpy()\n",
+    "    })\n",
+    "    return res_df"
    ]
   },
   {
 
@@ -38,7 +38,7 @@
    "source": [
     "### Cluster setup\n",
     "\n",
-    "We recommend using a cluster with [Databricks Runtime 16.4 LTS for ML](https://docs.databricks.com/en/release-notes/runtime/16.4lts-ml.html). The cluster can be either a single-node or multi-node CPU cluster. MMF leverages [Pandas UDF](https://docs.databricks.com/en/udf/pandas.html) under the hood and utilizes all the available resource. Make sure to set the following Spark configurations before you start your cluster: [`spark.sql.execution.arrow.enabled true`](https://spark.apache.org/docs/3.0.1/sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas) and [`spark.sql.adaptive.enabled false`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution). You can do this by specifying [Spark configuration](https://docs.databricks.com/en/compute/configure.html#spark-configuration) in the advanced options on the cluster creation page."
+    "We recommend using a cluster with [Databricks Runtime 17.3 LTS for ML](https://docs.databricks.com/en/release-notes/runtime/17.3lts-ml.html). The cluster can be either a single-node or multi-node CPU cluster. MMF leverages [Pandas UDF](https://docs.databricks.com/en/udf/pandas.html) under the hood and utilizes all the available resource. Make sure to set the following Spark configurations before you start your cluster: [`spark.sql.execution.arrow.enabled true`](https://spark.apache.org/docs/3.0.1/sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas) and [`spark.sql.adaptive.enabled false`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution). You can do this by specifying [Spark configuration](https://docs.databricks.com/en/compute/configure.html#spark-configuration) in the advanced options on the cluster creation page."
    ]
   },
   {
@@ -174,27 +174,26 @@
     "\n",
     "def create_m4_daily():\n",
     "    y_df, _, _ = M4.load(directory=str(pathlib.Path.home()), group=\"Daily\")\n",
-    "    _ids = [f\"D{i}\" for i in range(1, n)]\n",
+    "    target_ids = {f\"D{i}\" for i in range(1, n)}\n",
+    "    y_df = y_df[y_df[\"unique_id\"].isin(target_ids)]\n",
     "    y_df = (\n",
-    "        y_df.groupby(\"unique_id\")\n",
-    "        .filter(lambda x: x.unique_id.iloc[0] in _ids)\n",
-    "        .groupby(\"unique_id\")\n",
-    "        .apply(transform_group)\n",
-    "        .reset_index(drop=True)\n",
+    "        y_df.groupby(\"unique_id\", group_keys=False)\n",
+    "             .apply(lambda g: transform_group(g, g.name))\n",
+    "             .reset_index(drop=True)\n",
     "    )\n",
     "    return y_df\n",
     "\n",
     "\n",
-    "def transform_group(df):\n",
-    "    unique_id = df.unique_id.iloc[0]\n",
+    "def transform_group(df, unique_id):\n",
     "    if len(df) > 1020:\n",
     "        df = df.iloc[-1020:]\n",
-    "    _start = pd.Timestamp(\"2020-01-01\")\n",
-    "    _end = _start + pd.DateOffset(days=int(df.count()[0]) - 1)\n",
-    "    date_idx = pd.date_range(start=_start, end=_end, freq=\"D\", name=\"ds\")\n",
-    "    res_df = pd.DataFrame(data=[], index=date_idx).reset_index()\n",
-    "    res_df[\"unique_id\"] = unique_id\n",
-    "    res_df[\"y\"] = df.y.values\n",
+    "    start = pd.Timestamp(\"2020-01-01\")\n",
+    "    date_idx = pd.date_range(start=start, periods=len(df), freq=\"D\", name=\"ds\")\n",
+    "    res_df = pd.DataFrame({\n",
+    "        \"ds\": date_idx,\n",
+    "        \"unique_id\": unique_id,\n",
+    "        \"y\": df[\"y\"].to_numpy()\n",
+    "    })\n",
     "    return res_df"
    ]
   },
 
@@ -39,7 +39,7 @@
    "source": [
     "### Cluster setup\n",
     "\n",
-    "We recommend using a cluster with [Databricks Runtime 16.4 LTS for ML](https://docs.databricks.com/en/release-notes/runtime/16.4lts-ml.html).  The cluster can be either a single-node or multi-node CPU cluster. Make sure to set the following Spark configurations before you start your cluster: [`spark.sql.execution.arrow.enabled true`](https://spark.apache.org/docs/3.0.1/sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas) and [`spark.sql.adaptive.enabled false`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution). You can do this by specifying [Spark configuration](https://docs.databricks.com/en/compute/configure.html#spark-configuration) in the advanced options on the cluster creation page."
+    "We recommend using a cluster with [Databricks Runtime 17.3 LTS for ML](https://docs.databricks.com/en/release-notes/runtime/17.3lts-ml.html).  The cluster can be either a single-node or multi-node CPU cluster. Make sure to set the following Spark configurations before you start your cluster: [`spark.sql.execution.arrow.enabled true`](https://spark.apache.org/docs/3.0.1/sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas) and [`spark.sql.adaptive.enabled false`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution). You can do this by specifying [Spark configuration](https://docs.databricks.com/en/compute/configure.html#spark-configuration) in the advanced options on the cluster creation page."
    ]
   },
   {
 
@@ -174,27 +174,26 @@
     "\n",
     "def create_m4_hourly():\n",
     "    y_df, _, _ = M4.load(directory=str(pathlib.Path.home()), group=\"Hourly\")\n",
-    "    _ids = [f\"H{i}\" for i in range(1, n)]\n",
+    "    target_ids = {f\"H{i}\" for i in range(1, n)}\n",
+    "    y_df = y_df[y_df[\"unique_id\"].isin(target_ids)]\n",
     "    y_df = (\n",
-    "        y_df.groupby(\"unique_id\")\n",
-    "        .filter(lambda x: x.unique_id.iloc[0] in _ids)\n",
-    "        .groupby(\"unique_id\")\n",
-    "        .apply(transform_group)\n",
-    "        .reset_index(drop=True)\n",
+    "        y_df.groupby(\"unique_id\", group_keys=False)\n",
+    "             .apply(lambda g: transform_group(g, g.name))\n",
+    "             .reset_index(drop=True)\n",
     "    )\n",
     "    return y_df\n",
     "\n",
     "\n",
-    "def transform_group(df):\n",
-    "    unique_id = df.unique_id.iloc[0]\n",
+    "def transform_group(df, unique_id):\n",
     "    if len(df) > 720:\n",
     "        df = df.iloc[-720:]\n",
-    "    _start = pd.Timestamp(\"2025-01-01 00:00\")\n",
-    "    _end = _start + pd.DateOffset(hours=len(df)-1)\n",
-    "    date_idx = pd.date_range(start=_start, end=_end, freq=\"H\", name=\"ds\")\n",
-    "    res_df = pd.DataFrame(data=[], index=date_idx).reset_index()\n",
-    "    res_df[\"unique_id\"] = unique_id\n",
-    "    res_df[\"y\"] = df.y.values\n",
+    "    start = pd.Timestamp(\"2025-01-01 00:00\")\n",
+    "    date_idx = pd.date_range(start=start, periods=len(df), freq=\"h\", name=\"ds\")\n",
+    "    res_df = pd.DataFrame({\n",
+    "        \"ds\": date_idx,\n",
+    "        \"unique_id\": unique_id,\n",
+    "        \"y\": df[\"y\"].to_numpy()\n",
+    "    })\n",
     "    return res_df"
    ]
   },
 
@@ -174,27 +174,26 @@
     "\n",
     "def create_m4_hourly():\n",
     "    y_df, _, _ = M4.load(directory=str(pathlib.Path.home()), group=\"Hourly\")\n",
-    "    _ids = [f\"H{i}\" for i in range(1, n)]\n",
+    "    target_ids = {f\"H{i}\" for i in range(1, n)}\n",
+    "    y_df = y_df[y_df[\"unique_id\"].isin(target_ids)]\n",
     "    y_df = (\n",
-    "        y_df.groupby(\"unique_id\")\n",
-    "        .filter(lambda x: x.unique_id.iloc[0] in _ids)\n",
-    "        .groupby(\"unique_id\")\n",
-    "        .apply(transform_group)\n",
-    "        .reset_index(drop=True)\n",
+    "        y_df.groupby(\"unique_id\", group_keys=False)\n",
+    "             .apply(lambda g: transform_group(g, g.name))\n",
+    "             .reset_index(drop=True)\n",
     "    )\n",
     "    return y_df\n",
     "\n",
     "\n",
-    "def transform_group(df):\n",
-    "    unique_id = df.unique_id.iloc[0]\n",
+    "def transform_group(df, unique_id):\n",
     "    if len(df) > 720:\n",
     "        df = df.iloc[-720:]\n",
-    "    _start = pd.Timestamp(\"2025-01-01 00:00\")\n",
-    "    _end = _start + pd.DateOffset(hours=len(df)-1)\n",
-    "    date_idx = pd.date_range(start=_start, end=_end, freq=\"H\", name=\"ds\")\n",
-    "    res_df = pd.DataFrame(data=[], index=date_idx).reset_index()\n",
-    "    res_df[\"unique_id\"] = unique_id\n",
-    "    res_df[\"y\"] = df.y.values\n",
+    "    start = pd.Timestamp(\"2025-01-01 00:00\")\n",
+    "    date_idx = pd.date_range(start=start, periods=len(df), freq=\"h\", name=\"ds\")\n",
+    "    res_df = pd.DataFrame({\n",
+    "        \"ds\": date_idx,\n",
+    "        \"unique_id\": unique_id,\n",
+    "        \"y\": df[\"y\"].to_numpy()\n",
+    "    })\n",
     "    return res_df"
    ]
   },
 
@@ -38,7 +38,7 @@
    "source": [
     "### Cluster setup\n",
     "\n",
-    "We recommend using a cluster with [Databricks Runtime 16.4 LTS for ML](https://docs.databricks.com/en/release-notes/runtime/16.4lts-ml.html). The cluster can be either a single-node or multi-node CPU cluster. MMF leverages [Pandas UDF](https://docs.databricks.com/en/udf/pandas.html) under the hood and utilizes all the available resource. Make sure to set the following Spark configurations before you start your cluster: [`spark.sql.execution.arrow.enabled true`](https://spark.apache.org/docs/3.0.1/sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas) and [`spark.sql.adaptive.enabled false`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution). You can do this by specifying [Spark configuration](https://docs.databricks.com/en/compute/configure.html#spark-configuration) in the advanced options on the cluster creation page."
+    "We recommend using a cluster with [Databricks Runtime 17.3 LTS for ML](https://docs.databricks.com/en/release-notes/runtime/17.3lts-ml.html). The cluster can be either a single-node or multi-node CPU cluster. MMF leverages [Pandas UDF](https://docs.databricks.com/en/udf/pandas.html) under the hood and utilizes all the available resource. Make sure to set the following Spark configurations before you start your cluster: [`spark.sql.execution.arrow.enabled true`](https://spark.apache.org/docs/3.0.1/sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas) and [`spark.sql.adaptive.enabled false`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution). You can do this by specifying [Spark configuration](https://docs.databricks.com/en/compute/configure.html#spark-configuration) in the advanced options on the cluster creation page."
    ]
   },
   {
@@ -174,27 +174,26 @@
     "\n",
     "def create_m4_hourly():\n",
     "    y_df, _, _ = M4.load(directory=str(pathlib.Path.home()), group=\"Hourly\")\n",
-    "    _ids = [f\"H{i}\" for i in range(1, n)]\n",
+    "    target_ids = {f\"H{i}\" for i in range(1, n)}\n",
+    "    y_df = y_df[y_df[\"unique_id\"].isin(target_ids)]\n",
     "    y_df = (\n",
-    "        y_df.groupby(\"unique_id\")\n",
-    "        .filter(lambda x: x.unique_id.iloc[0] in _ids)\n",
-    "        .groupby(\"unique_id\")\n",
-    "        .apply(transform_group)\n",
-    "        .reset_index(drop=True)\n",
+    "        y_df.groupby(\"unique_id\", group_keys=False)\n",
+    "             .apply(lambda g: transform_group(g, g.name))\n",
+    "             .reset_index(drop=True)\n",
     "    )\n",
     "    return y_df\n",
     "\n",
     "\n",
-    "def transform_group(df):\n",
-    "    unique_id = df.unique_id.iloc[0]\n",
+    "def transform_group(df, unique_id):\n",
     "    if len(df) > 720:\n",
     "        df = df.iloc[-720:]\n",
-    "    _start = pd.Timestamp(\"2025-01-01 00:00\")\n",
-    "    _end = _start + pd.DateOffset(hours=len(df)-1)\n",
-    "    date_idx = pd.date_range(start=_start, end=_end, freq=\"H\", name=\"ds\")\n",
-    "    res_df = pd.DataFrame(data=[], index=date_idx).reset_index()\n",
-    "    res_df[\"unique_id\"] = unique_id\n",
-    "    res_df[\"y\"] = df.y.values\n",
+    "    start = pd.Timestamp(\"2025-01-01 00:00\")\n",
+    "    date_idx = pd.date_range(start=start, periods=len(df), freq=\"h\", name=\"ds\")\n",
+    "    res_df = pd.DataFrame({\n",
+    "        \"ds\": date_idx,\n",
+    "        \"unique_id\": unique_id,\n",
+    "        \"y\": df[\"y\"].to_numpy()\n",
+    "    })\n",
     "    return res_df"
    ]
   },
 
@@ -79,7 +79,7 @@
    },
    "outputs": [],
    "source": [
-    "%pip install datasetsforecast==0.0.8 --quiet\n",
+    "%pip install datasetsforecast==0.0.8 pandas==2.2.3 --quiet\n",
     "dbutils.library.restartPython()"
    ]
   },
@@ -174,31 +174,27 @@
     "\n",
     "def create_m4_monthly():\n",
     "    y_df, _, _ = M4.load(directory=str(pathlib.Path.home()), group=\"Monthly\")\n",
-    "    _ids = [f\"M{i}\" for i in range(1, n + 1)]\n",
+    "    target_ids = {f\"M{i}\" for i in range(1, n)}\n",
+    "    y_df = y_df[y_df[\"unique_id\"].isin(target_ids)]\n",
     "    y_df = (\n",
-    "        y_df.groupby(\"unique_id\")\n",
-    "        .filter(lambda x: x.unique_id.iloc[0] in _ids)\n",
-    "        .groupby(\"unique_id\")\n",
-    "        .apply(transform_group)\n",
-    "        .reset_index(drop=True)\n",
+    "        y_df.groupby(\"unique_id\", group_keys=False)\n",
+    "             .apply(lambda g: transform_group(g, g.name))\n",
+    "             .reset_index(drop=True)\n",
     "    )\n",
     "    return y_df\n",
     "\n",
     "\n",
-    "def transform_group(df):\n",
-    "    unique_id = df.unique_id.iloc[0]\n",
-    "    _cnt = 60  # df.count()[0]\n",
-    "    _start = pd.Timestamp(\"2018-01-01\")\n",
-    "    _end = _start + pd.DateOffset(months=_cnt)\n",
-    "    date_idx = pd.date_range(start=_start, end=_end, freq=\"M\", name=\"date\")\n",
-    "    _df = (\n",
-    "        pd.DataFrame(data=[], index=date_idx)\n",
-    "        .reset_index()\n",
-    "        .rename(columns={\"index\": \"date\"})\n",
-    "    )\n",
-    "    _df[\"unique_id\"] = unique_id\n",
-    "    _df[\"y\"] = df[:60].y.values\n",
-    "    return _df\n"
+    "def transform_group(df, unique_id):\n",
+    "    if len(df) > 60:\n",
+    "        df = df.iloc[-60:]\n",
+    "    start = pd.Timestamp(\"2018-01-01\")\n",
+    "    date_idx = pd.date_range(start=start, periods=len(df), freq=\"ME\", name=\"ds\")\n",
+    "    res_df = pd.DataFrame({\n",
+    "        \"ds\": date_idx,\n",
+    "        \"unique_id\": unique_id,\n",
+    "        \"y\": df[\"y\"].to_numpy()\n",
+    "    })\n",
+    "    return res_df"
    ]
   },
   {
@@ -309,7 +305,7 @@
    },
    "outputs": [],
    "source": [
-    "display(spark.sql(f\"select unique_id, count(date) as count from {catalog}.{db}.m4_monthly_train group by unique_id order by unique_id\"))"
+    "display(spark.sql(f\"select unique_id, count(ds) as count from {catalog}.{db}.m4_monthly_train group by unique_id order by unique_id\"))"
    ]
   },
   {
@@ -331,7 +327,7 @@
    "outputs": [],
    "source": [
     "display(\n",
-    "  spark.sql(f\"select * from {catalog}.{db}.m4_monthly_train where unique_id in ('M1', 'M2', 'M3', 'M4', 'M5') order by unique_id, date\")\n",
+    "  spark.sql(f\"select * from {catalog}.{db}.m4_monthly_train where unique_id in ('M1', 'M2', 'M3', 'M4', 'M5') order by unique_id, ds\")\n",
     "  )"
    ]
   },
@@ -547,7 +543,7 @@
     "display(spark.sql(f\"\"\"\n",
     "    select * from {catalog}.{db}.monthly_scoring_output \n",
     "    where unique_id = 'M1'\n",
-    "    order by unique_id, model, date\n",
+    "    order by unique_id, model, ds\n",
     "    \"\"\"))"
    ]
   },
Original file line number	Diff line number	Diff line change
`@@ -39,7 +39,7 @@`
`39`	`39`	`"source": [`
`40`	`40`	`"### Cluster setup\n",`
`41`	`41`	`"\n",`
`42`		- "We recommend using a cluster with [Databricks Runtime 16.4 LTS for ML](https://docs.databricks.com/en/release-notes/runtime/16.4lts-ml.html). The cluster can be either a single-node or multi-node CPU cluster. Make sure to set the following Spark configurations before you start your cluster: [`spark.sql.execution.arrow.enabled true`](https://spark.apache.org/docs/3.0.1/sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas) and [`spark.sql.adaptive.enabled false`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution). You can do this by specifying [Spark configuration](https://docs.databricks.com/en/compute/configure.html#spark-configuration) in the advanced options on the cluster creation page."
	`42`	+ "We recommend using a cluster with [Databricks Runtime 17.3 LTS for ML](https://docs.databricks.com/en/release-notes/runtime/17.3lts-ml.html). The cluster can be either a single-node or multi-node CPU cluster. Make sure to set the following Spark configurations before you start your cluster: [`spark.sql.execution.arrow.enabled true`](https://spark.apache.org/docs/3.0.1/sql-pyspark-pandas-with-arrow.html#enabling-for-conversion-tofrom-pandas) and [`spark.sql.adaptive.enabled false`](https://spark.apache.org/docs/latest/sql-performance-tuning.html#adaptive-query-execution). You can do this by specifying [Spark configuration](https://docs.databricks.com/en/compute/configure.html#spark-configuration) in the advanced options on the cluster creation page."
`43`	`43`	`]`
`44`	`44`	`},`
`45`	`45`	`{`