Skip to content

RAG First Principles

Christopher David edited this page Nov 16, 2023 · 17 revisions

Why

  • We need an agent to act on knowledge it wasn't trained on.

What

  • Retrieval-Augmented Generation is a technique where an AI first finds relevant information from a database, then uses this information to generate an informed response.

How

  • Data must be stored in a database such that the agent can find information relevant to a query
    • File has many Embeddings
    • Both file and query will use the same embedding model, ____
  • A user query should be converted into a search over documents for appropriate context
    • The user query should be converted from natural language to a vector embedding
  • Perform the search
    • A cosine similarity search should be run between the query embedding and the user's file embeddings
  • Return the most relevant documents
    • content and metadata for each
    • How should we limit? Should we try to fill up the context window with as much content as possible?
  • Combine resulting text/metadata into a prompt
  • Send prompt to LLM to synthesize an answer

Buildout Order

  • Write unit tests for new data models Chunk+Embedding and relationship with File
  • Write feature tests for:
    • creating and retrieving embeddings from database
    • converting user query into embedding
    • running cosine similarity search
    • generating text inference
    • full RAG query flow
Clone this wiki locally