Skip to content

Latest commit

 

History

History
74 lines (58 loc) · 4.4 KB

Overview of Agentic RAG.MD

File metadata and controls

74 lines (58 loc) · 4.4 KB

#RAG #agents

Core Components of Agentic RAG

Agentic RAG has two core components.

  1. Dynamic planning and execution
    1. The agentic strategy we use to plan and execute against a user query we receive.
  2. Retrieval pipeline
    1. Document parsing and chunking strategy: How we extract text from various document types and chunk them into smaller sub-documents.
    2. Embedding models: The models we use to transform the sub-documents into numeric vector representations that we can store in vector databases and retrieve later on.
    3. Document retrieval strategy: The strategy we use to retrieve the sub-documents to answer queries.

Dynamic Planning and Execution

  • LangGraph uses a computational graph to execute agentic workflows.

Retrieval Pipeline

Retrieval pipeline will consist of a document parser, an embedding model, and a similarity metric (for which we retrieve our embeddings).

Document Parsing & Chunking Strategy

Document parsing should have the following:

  • Flexibility: Handles a range of document types including pdf, html, txt etc.
  • Structurally Aware: Must be able to parse documents such that the structure of the document is preserved. This will include things like preserving tables.

Embedding & Retrieval Strategy

graph TD
    subgraph "Bi-Encoder"
        A["Sentence A"] --> B1["BERT"]
        B["Sentence B"] --> B2["BERT"]
        B1 --> P1["pooling"]
        B2 --> P2["pooling"]
        P1 --> U["u"]
        P2 --> V["v"]
        U --> CS["Cosine-Similarity"]
        V --> CS
    end

    subgraph "Cross-Encoder"
        C["Sentence A"] --> BE["BERT"]
        D["Sentence B"] --> BE
        BE --> CL["Classifier"]
        CL --> O["0..1"]
    end
Loading

Source

Sentence embedding models come in two flavours, bi-encoders and cross-encoders.

  • Bi-Encoders: Produce separate sentence embedding for each input text. They are faster but less accurate than cross encoders. Useful for semantic search, or Information retrieval due to the efficiency,
  • Cross-Encoders: Process pairs of sentences together to produce a similarity score. They are slower, but more accurate than bi-encoders. Use For small pre-defined data sets, cross encoders can be useful.

It's common practice to combine bi-encoders and cross-encoders in a multi-stage approach. The bi-encoders will help produce the initial retrieval candidates. The cross encoders are used to re-rank top candidates for higher accuracy.

Knowledge Graph Retrieval

In Hybrid Mode, Jar3d creates a Neo4j knowledge graph from the ingested documents. We have found that combining KG retrieval with RAG improves it's ability to answer more complex queries.

We leverage an LLM Graph Transformer to create a knowledge graph from the documents.

We query the knowlede graph using Cypher Query Language to isolate the most relevant relationhips and nodes. This retireved context is fed into the LLM alongside the context retrievd from the similarity search and reranking with the cross-encoder.

Agentic RAG Video

Tooling

LLM Sherpa - Open Source Covers document parsing

LLM Sherpa: Handles a variety of documents and provides some structural awareness. Allows self hosting via the backend service which is fully open sourced. LLM Sherpa does smart chunking which preserves the integrity of text by keeping related text together. Research suggests that it struggles with more complex PDF doc types. Platforms:

Tools for Embedding

Langchain integrations make it easy to leverage a variety of embedding model services.