Skip to content

Commit c3f12c7

Browse files
epec254smurching
andauthored
Refined cookbook (#45)
* Agent config * Global config * Global config * Data pipeline v1 * Eric's updates to refine cookbook * Move function calling agent config into its own folder * RAG Only agent * config folder * remove eric hardcoded vs endpoint * respect token config in data pipeline * Configs to a single cell * Fix print of config * Speed up parsing by caching the parsed table * Fix error handling in create VS index * REmove debug code * Clean up uses new cookbook config * Refactor the agent notebooks * REmove extra init * fix data pipeline mlflow tag * Default to llama * Improve debug of failed records * Add debug code * Simplifications Signed-off-by: Sid Murching <[email protected]> * WIP updates, made it partway through agent notebooks Signed-off-by: Sid Murching <[email protected]> * WIP debugging the function_calling_agent_mlflow_sdk.py Signed-off-by: Sid Murching <[email protected]> * WIP. Current state: 1) Dogfood seems to have an issue where tool calling is returning a nonexistent function name. See trace in https://e2-dogfood.staging.cloud.databricks.com/editor/notebooks/2948323364468680?o=6051921418418893#command/397386389372146 2) For RAG only agent, getting 'AttributeError: 'LLMConfig' object has no attribute 'tools'' because RAG agent config object doesn't have a tools field. See https://e2-dogfood.staging.cloud.databricks.com/editor/notebooks/2948323364468746?o=6051921418418893#command/397386389372955 Signed-off-by: Sid Murching <[email protected]> * WIP Signed-off-by: Sid Murching <[email protected]> * WIP updating pydantic config structure Signed-off-by: Sid Murching <[email protected]> * More progress. Remaining issues: * utils modules are not importable when agent notebooks are run directly * Need to fix logging util * Autologged traces and manual traces in langchain don't interleave properly Signed-off-by: Sid Murching <[email protected]> * WIP Signed-off-by: Sid Murching <[email protected]> * WIP, more progress, remaining items: * Restore ModelConfig across examples * Switch back to OAI SDK * Document tools class Signed-off-by: Sid Murching <[email protected]> * WIP, moving away from ModelConfig due to several devex issues Signed-off-by: Sid Murching <[email protected]> * WIP, got most of the code to work Signed-off-by: Sid Murching <[email protected]> * Clean up traces for langchain Signed-off-by: Sid Murching <[email protected]> * Update documentation Signed-off-by: Sid Murching <[email protected]> * Switch back to printing doc string, since it's cleaner than help() Signed-off-by: Sid Murching <[email protected]> * Data pipeline - refactor all utils for ease of use & ability to run in local IDE. * Update git ignore * locally tested tool code * working func call agent * remove mlflow agent * databricks utils * data pipeline bugs + switch to content from doc_content * working data pipeline (except for install of utils) * refactor agent configs to make room for genie * removed unused code * Genie agent * commit for serializable model with all the old code in comments (just in case) * remove old code from serializable model * shared config loader * ignore tmp configs * tmp config readme * initial multi agent * multi agent works w/ function calling not tested with genie * datapipline configs suport serializedmodel * all agents work locally * clean up * add UC tool * part 1 refactor * tools refactor pt 2 * make function call agent work with new dirs * agent logging works * add tmp files to see full diff * fc agent actually works, remove dead code agent that wasnt refactored * rename fc agent * rename mutli agent * rename common to shared in agents * move config base to __init__ * move get_current_user_info to db utils * missing imports * mutli agent works locally * fix bug where is the fc agent is called by supervisor directly after another agent, it switched the last message from role=assistant to role=user * local funcs work * multi agent refactor for better trace & ease of understanding * mlflow traces show supervisor COT * improve mlflow traces * print response vs messages info * bug fixes in agent supervisor * local func tracing + dict for uc tool result * uc tool parsing code for spark exceptions * uc tool is pytest-able * sample sku tool works * refactor stragglers * fc test code * remove rag only agent * simplify errors * sku translator sample tests * tools nb * temp nbs * pytest * tests for sample tools * sample code exec tool * data pipeline config dumps the actual uc locations * fix set_model * remove dependency on index's source table in vector search tool * remove commented out code * fix set_model and debug * load_config refactor * Tools creation notebook * Genie agent uses new config loader * multi agent supervisor works locally * Tool calling agent * multi agent * clear data pipeline outputs * read me updates * Multi-agent works with endpoint * tool calling agent nb * load config doc string * update noteboks * update gitignnore * clean up readme * remove dead code * clean up readme * move new openai sdk to seperate folder * restore existing agent code * mlflow tracing disabled hacks * sample tools * notebook tweaks * load_config tweaks * poetry env * Tools deployment works * Deployment logic same on all agents --------- Signed-off-by: Sid Murching <[email protected]> Co-authored-by: epec254 <epec254> Co-authored-by: Sid Murching <[email protected]>
1 parent 09c476b commit c3f12c7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+14795
-0
lines changed

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,11 @@ __pycache__
77

88
# Exclude `databricks sync` CLI command snapshots
99
.databricks
10+
openai_sdk_agent_app_sample_code/configs/*/*.yaml
11+
openai_sdk_agent_app_sample_code/configs/*.yaml
12+
dist/
13+
mlruns/
14+
15+
_scratch_pad/
16+
openai_sdk_agent_app_sample_code/_scratch_pad/
17+
.vscode/

openai_sdk_agent_app_sample_code/01_data_pipeline.ipynb

Lines changed: 947 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {
6+
"application/vnd.databricks.v1+cell": {
7+
"cellMetadata": {},
8+
"inputWidgets": {},
9+
"nuid": "d0640741-6d84-482a-aa79-f87b04d04023",
10+
"showTitle": false,
11+
"tableResultSettingsMap": {},
12+
"title": ""
13+
}
14+
},
15+
"source": [
16+
"## 👉 START HERE: How to use this notebook\n",
17+
"\n",
18+
"### Step 1: Agent storage configuration\n",
19+
"\n",
20+
"This notebook initializes a `AgentStorageConfig` Pydantic class to define the locations where the Agent's code/config and its supporting data & metadata is stored in the Unity Catalog:\n",
21+
"- **Unity Catalog Model:** Stores staging/production versions of the Agent's code/config\n",
22+
"- **MLflow Experiment:** Stores every development version of the Agent's code/config, each version's associated quality/cost/latency evaluation results, and any MLflow Traces from your development & evaluation processes\n",
23+
"- **Evaluation Set Delta Table:** Stores the Agent's evaluation set\n",
24+
"\n",
25+
"This notebook does the following:\n",
26+
"1. Validates the provided locations exist.\n",
27+
"2. Serializes this configuration to `config/agent_storage_config.yaml` so other notebooks can use it"
28+
]
29+
},
30+
{
31+
"cell_type": "markdown",
32+
"metadata": {
33+
"application/vnd.databricks.v1+cell": {
34+
"cellMetadata": {},
35+
"inputWidgets": {},
36+
"nuid": "7702011a-84dd-4281-bba1-ea9e2b5e551d",
37+
"showTitle": false,
38+
"tableResultSettingsMap": {},
39+
"title": ""
40+
}
41+
},
42+
"source": [
43+
"**Important note:** Throughout this notebook, we indicate which cells you:\n",
44+
"- ✅✏️ *should* customize - these cells contain config settings to change\n",
45+
"- 🚫✏️ *typically will not* customize - these cells contain boilerplate code required to validate / save the configuration\n",
46+
"\n",
47+
"*Cells that don't require customization still need to be run!*"
48+
]
49+
},
50+
{
51+
"cell_type": "markdown",
52+
"metadata": {
53+
"application/vnd.databricks.v1+cell": {
54+
"cellMetadata": {},
55+
"inputWidgets": {},
56+
"nuid": "f8963d6e-3123-4095-bb92-9d508c52ed41",
57+
"showTitle": false,
58+
"tableResultSettingsMap": {},
59+
"title": ""
60+
}
61+
},
62+
"source": [
63+
"### 🚫✏️ Install Python libraries"
64+
]
65+
},
66+
{
67+
"cell_type": "code",
68+
"execution_count": null,
69+
"metadata": {
70+
"application/vnd.databricks.v1+cell": {
71+
"cellMetadata": {},
72+
"inputWidgets": {},
73+
"nuid": "0a145c3b-d3d9-4b95-b7f6-22e1d8e991c6",
74+
"showTitle": false,
75+
"tableResultSettingsMap": {},
76+
"title": ""
77+
}
78+
},
79+
"outputs": [],
80+
"source": [
81+
"# %pip install -qqqq -U -r requirements.txt\n",
82+
"# %restart_python"
83+
]
84+
},
85+
{
86+
"cell_type": "markdown",
87+
"metadata": {},
88+
"source": [
89+
"### 🚫✏️ Connect to Databricks\n",
90+
"\n",
91+
"If running locally in an IDE using Databricks Connect, connect the Spark client & configure MLflow to use Databricks Managed MLflow. If this running in a Databricks Notebook, these values are already set."
92+
]
93+
},
94+
{
95+
"cell_type": "code",
96+
"execution_count": 1,
97+
"metadata": {},
98+
"outputs": [],
99+
"source": [
100+
"from mlflow.utils import databricks_utils as du\n",
101+
"\n",
102+
"if not du.is_in_databricks_notebook():\n",
103+
" from databricks.connect import DatabricksSession\n",
104+
" import os\n",
105+
"\n",
106+
" spark = DatabricksSession.builder.getOrCreate()\n",
107+
" os.environ[\"MLFLOW_TRACKING_URI\"] = \"databricks\""
108+
]
109+
},
110+
{
111+
"cell_type": "markdown",
112+
"metadata": {
113+
"application/vnd.databricks.v1+cell": {
114+
"cellMetadata": {},
115+
"inputWidgets": {},
116+
"nuid": "a9feb28c-c72b-49b2-bbc4-a9bd4721a7cd",
117+
"showTitle": false,
118+
"tableResultSettingsMap": {},
119+
"title": ""
120+
}
121+
},
122+
"source": [
123+
"### 🚫✏️ Get current user info to set default values"
124+
]
125+
},
126+
{
127+
"cell_type": "code",
128+
"execution_count": null,
129+
"metadata": {
130+
"application/vnd.databricks.v1+cell": {
131+
"cellMetadata": {},
132+
"inputWidgets": {},
133+
"nuid": "7824cc0a-1b29-4cf9-a974-2c5ef885979f",
134+
"showTitle": false,
135+
"tableResultSettingsMap": {},
136+
"title": ""
137+
}
138+
},
139+
"outputs": [],
140+
"source": [
141+
"from cookbook.databricks_utils import get_current_user_info\n",
142+
"\n",
143+
"user_email, user_name, default_catalog = get_current_user_info(spark)\n",
144+
"\n",
145+
"print(f\"User email: {user_email}\")\n",
146+
"print(f\"User name: {user_name}\")\n",
147+
"print(f\"Default UC catalog: {default_catalog}\")"
148+
]
149+
},
150+
{
151+
"cell_type": "markdown",
152+
"metadata": {
153+
"application/vnd.databricks.v1+cell": {
154+
"cellMetadata": {},
155+
"inputWidgets": {},
156+
"nuid": "4b684188-d4eb-4944-86ae-9942a68308c2",
157+
"showTitle": false,
158+
"tableResultSettingsMap": {},
159+
"title": ""
160+
}
161+
},
162+
"source": [
163+
"### ✅✏️ Configure your Agent's storage locations\n",
164+
"\n",
165+
"Either review & accept the default values or enter your preferred location."
166+
]
167+
},
168+
{
169+
"cell_type": "code",
170+
"execution_count": null,
171+
"metadata": {
172+
"application/vnd.databricks.v1+cell": {
173+
"cellMetadata": {},
174+
"inputWidgets": {},
175+
"nuid": "64682c1f-7e61-430e-84c9-4fb9cad8152b",
176+
"showTitle": false,
177+
"tableResultSettingsMap": {},
178+
"title": ""
179+
}
180+
},
181+
"outputs": [],
182+
"source": [
183+
"from cookbook.config.shared.agent_storage_location import AgentStorageConfig\n",
184+
"from cookbook.databricks_utils import get_mlflow_experiment_url\n",
185+
"import mlflow\n",
186+
"\n",
187+
"# Default values below for `AgentStorageConfig` \n",
188+
"agent_name = \"my_agent_2\"\n",
189+
"uc_catalog_name = f\"{default_catalog}\"\n",
190+
"uc_schema_name = f\"{user_name}_agents\"\n",
191+
"uc_catalog_name = f\"ep\"\n",
192+
"uc_schema_name = f\"cookbook_local_test\"\n",
193+
"\n",
194+
"# Agent storage configuration\n",
195+
"agent_storage_config = AgentStorageConfig(\n",
196+
" uc_model_name=f\"{uc_catalog_name}.{uc_schema_name}.{agent_name}\", # UC model to store staging/production versions of the Agent's code/config\n",
197+
" evaluation_set_uc_table=f\"{uc_catalog_name}.{uc_schema_name}.{agent_name}_eval_set\", # UC table to store the evaluation set\n",
198+
" mlflow_experiment_name=f\"/Users/{user_email}/{agent_name}_mlflow_experiment\", # MLflow Experiment to store development versions of the Agent and their associated quality/cost/latency evaluation results + MLflow Traces\n",
199+
")\n",
200+
"\n",
201+
"# Validate the UC catalog and schema for the Agent'smodel & evaluation table\n",
202+
"is_valid, msg = agent_storage_config.validate_catalog_and_schema()\n",
203+
"if not is_valid:\n",
204+
" raise Exception(msg)\n",
205+
"\n",
206+
"# Set the MLflow experiment, validating the path is valid\n",
207+
"experiment_info = mlflow.set_experiment(agent_storage_config.mlflow_experiment_name)\n",
208+
"# If running in a local IDE, set the MLflow experiment name as an environment variable\n",
209+
"os.environ[\"MLFLOW_EXPERIMENT_NAME\"] = agent_storage_config.mlflow_experiment_name\n",
210+
"\n",
211+
"print(f\"View the MLflow Experiment `{agent_storage_config.mlflow_experiment_name}` at {get_mlflow_experiment_url(experiment_info.experiment_id)}\")"
212+
]
213+
},
214+
{
215+
"cell_type": "markdown",
216+
"metadata": {
217+
"application/vnd.databricks.v1+cell": {
218+
"cellMetadata": {},
219+
"inputWidgets": {},
220+
"nuid": "7a49117d-f136-41fa-807d-8be60b863fa9",
221+
"showTitle": false,
222+
"tableResultSettingsMap": {},
223+
"title": ""
224+
}
225+
},
226+
"source": [
227+
"### 🚫✏️ Save the configuration for use by other notebooks"
228+
]
229+
},
230+
{
231+
"cell_type": "code",
232+
"execution_count": 5,
233+
"metadata": {
234+
"application/vnd.databricks.v1+cell": {
235+
"cellMetadata": {},
236+
"inputWidgets": {},
237+
"nuid": "6dd99015-5b0d-420b-8a3e-067d84b84dc7",
238+
"showTitle": false,
239+
"tableResultSettingsMap": {},
240+
"title": ""
241+
}
242+
},
243+
"outputs": [],
244+
"source": [
245+
"from cookbook.config import serializable_config_to_yaml_file\n",
246+
"\n",
247+
"serializable_config_to_yaml_file(agent_storage_config, \"./configs/agent_storage_config.yaml\")"
248+
]
249+
}
250+
],
251+
"metadata": {
252+
"application/vnd.databricks.v1+notebook": {
253+
"dashboards": [],
254+
"environmentMetadata": null,
255+
"language": "python",
256+
"notebookMetadata": {
257+
"pythonIndentUnit": 2
258+
},
259+
"notebookName": "00_shared_config",
260+
"widgets": {}
261+
},
262+
"kernelspec": {
263+
"display_name": "genai-cookbook-T2SdtsNM-py3.11",
264+
"language": "python",
265+
"name": "python3"
266+
},
267+
"language_info": {
268+
"codemirror_mode": {
269+
"name": "ipython",
270+
"version": 3
271+
},
272+
"file_extension": ".py",
273+
"mimetype": "text/x-python",
274+
"name": "python",
275+
"nbconvert_exporter": "python",
276+
"pygments_lexer": "ipython3",
277+
"version": "3.11.10"
278+
}
279+
},
280+
"nbformat": 4,
281+
"nbformat_minor": 0
282+
}

0 commit comments

Comments
 (0)