Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update learn section of genai_cookbook site to Agents #33

Merged
merged 8 commits into from
Oct 9, 2024

Conversation

prithvikannan
Copy link
Collaborator

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

How is this PR tested?

@prithvikannan prithvikannan changed the title Update learn section of cookbook to Agents Update learn section of genai_cookbook site to Agents Oct 2, 2024
Copy link

@bbqiu bbqiu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes LGTM given all the wording is pulled from other docs. Don't feel super qualified to review genai_cookbook/nbs/1-introduction-to-rag.md haha

@epec254
Copy link
Collaborator

epec254 commented Oct 2, 2024

What is the intent of this PR? My read of it is that we are changing "RAG" to "Agent". Instead, I would suggest we do the following:

  1. Discuss upfront that there are many types of agents - function-calling agents w/ tools, rag-only agents, etc.
  2. Position the current RAG content as how you should think about evaluating the rag-only agent OR the retriever within a function-calling agent.

P1 3) Add more content about function-calling agents.

Signed-off-by: Prithvi Kannan <[email protected]>
Signed-off-by: Prithvi Kannan <[email protected]>
Signed-off-by: Prithvi Kannan <[email protected]>
Signed-off-by: Prithvi Kannan <[email protected]>

Unstructured data lacks a predefined data model or schema, making it impossible to query on the basis of structure and metadata alone. As a result, unstructured data requires techniques that can understand and extract semantic meaning from raw text, images, audio, or other content.

During data preparation, the RAG application's data pipeline takes raw unstructured data and transforms it into discrete chunks that can be queried based on their relevance to a user's query. The key steps in data preprocessing are outlined below. Each step has a variety of knobs that can be tuned - for a deeper dive discussion on these knobs, please refer to the [deep dive into RAG section.](/nbs/3-deep-dive)
During data preparation, the Agent application's data pipeline takes raw unstructured data and transforms it into discrete chunks that can be queried based on their relevance to a user's query. The key steps in data preprocessing are outlined below. Each step has a variety of knobs that can be tuned - for a deeper dive discussion on these knobs, please refer to the [deep dive into RAG section.](/nbs/3-deep-dive)

```{image} ../images/2-fundamentals-unstructured/2_img.png
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image should also prob not say "RAG Chain"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, on line 20 we should update

The following are the typical steps of a data pipeline in a RAG application using unstructured data:

Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving some comments, thanks @prithvikannan !

Signed-off-by: Prithvi Kannan <[email protected]>
Signed-off-by: Prithvi Kannan <[email protected]>

When using a standalone LLM, a user submits a request, such as a question, to the LLM, and the LLM responds with an answer based solely on its training data.

In its most basic form, the following steps happen in a RAG application:
In its most basic form, the following steps happen in an agent application with a retriever tool:

1. **Retrieval:** The **user's request** is used to query some outside source of information. This might mean querying a vector store, conducting a keyword search over some text, or querying a SQL database. The goal of the retrieval step is to obtain **supporting data** that will help the LLM provide a useful response.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different too in agents right? Maybe we can copy from the steps you wrote out below in genai_cookbook/nbs/2-fundamentals-unstructured-chain.md

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(ie the agent decides whether it needs to do retrieval, vs it being fixed as the first step)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we could actually have a table comparing RAG steps vs agent steps, so that the pic below still makes sense (we could mention that the diagram shows RAG but for agents, the LLM would be the one deciding to perform retrieval)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol oops. i inlined the steps from genai_cookbook/nbs/2-fundamentals-unstructured-chain.md

i think the image is still okay, i just added some text that says this is what happens if the retriever tool is selected.

Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good! Some last comments - on the images, if we can't get updated graphics, we should probably hide ones that refer to RAG in pages besides the intro, because I think at that point they make the RAG vs agent distinction more confusing than they help

Signed-off-by: Prithvi Kannan <[email protected]>
Copy link
Collaborator

@smurching smurching left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, LGTM!

@prithvikannan prithvikannan merged commit 6e3b334 into databricks:main Oct 9, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants