RAG First Principles

Jump to bottom

Christopher David edited this page Nov 16, 2023 · 17 revisions

Why

We need an agent to act on knowledge it wasn't trained on.

What

Retrieval-Augmented Generation is a technique where an AI first finds relevant information from a database, then uses this information to generate an informed response.

How

Data must be stored in a database such that the agent can find information relevant to a query
- File has many Embeddings
- Both file and query will use the same embedding model, ____
A user query should be converted into a search over documents for appropriate context
- The user query should be converted from natural language to a vector embedding
Perform the search
- A cosine similarity search should be run between the query embedding and the user's file embeddings
Return the most relevant documents
- content and metadata for each
- How should we limit? Should we try to fill up the context window with as much content as possible?
Combine resulting text/metadata into a prompt
Send prompt to LLM to synthesize an answer

Buildout Order

Write unit tests for new data models Chunk+Embedding and relationship with File
Write feature tests for:
- creating and retrieving embeddings from database
- converting user query into embedding
- running cosine similarity search
- generating text inference
- full RAG query flow