-
Notifications
You must be signed in to change notification settings - Fork 30
RAG First Principles
Christopher David edited this page Nov 16, 2023
·
17 revisions
Why
- We need an agent to act on knowledge it wasn't trained on.
What
- Retrieval-Augmented Generation is a technique where an AI first finds relevant information from a database, then uses this information to generate an informed response.
How
- Data must be stored in a database such that the agent can find information relevant to a query
-
File
has manyEmbeddings
- Both file and query will use the same embedding model, ____
-
- A user query should be converted into a search over documents for appropriate context
- The user query should be converted from natural language to a vector embedding
- Perform the search
- A cosine similarity search should be run between the query embedding and the user's file embeddings
- Return the most relevant documents
- content and metadata for each
- How should we limit? Should we try to fill up the context window with as much content as possible?
- Combine resulting text/metadata into a prompt
- Send prompt to LLM to synthesize an answer
Buildout Order
- Write unit tests for new data models Chunk+Embedding and relationship with File
- Write feature tests for:
- creating and retrieving embeddings from database
- converting user query into embedding
- running cosine similarity search
- generating text inference
- full RAG query flow