#RAG #agents
Agentic RAG has two core components.
- Dynamic planning and execution
- The agentic strategy we use to plan and execute against a user query we receive.
- Retrieval pipeline
- Document parsing and chunking strategy: How we extract text from various document types and chunk them into smaller sub-documents.
- Embedding models: The models we use to transform the sub-documents into numeric vector representations that we can store in vector databases and retrieve later on.
- Document retrieval strategy: The strategy we use to retrieve the sub-documents to answer queries.
- LangGraph uses a computational graph to execute agentic workflows.
Retrieval pipeline will consist of a document parser, an embedding model, and a similarity metric (for which we retrieve our embeddings).
Document parsing should have the following:
- Flexibility: Handles a range of document types including pdf, html, txt etc.
- Structurally Aware: Must be able to parse documents such that the structure of the document is preserved. This will include things like preserving tables.
graph TD
subgraph "Bi-Encoder"
A["Sentence A"] --> B1["BERT"]
B["Sentence B"] --> B2["BERT"]
B1 --> P1["pooling"]
B2 --> P2["pooling"]
P1 --> U["u"]
P2 --> V["v"]
U --> CS["Cosine-Similarity"]
V --> CS
end
subgraph "Cross-Encoder"
C["Sentence A"] --> BE["BERT"]
D["Sentence B"] --> BE
BE --> CL["Classifier"]
CL --> O["0..1"]
end
Sentence embedding models come in two flavours, bi-encoders and cross-encoders.
- Bi-Encoders: Produce separate sentence embedding for each input text. They are faster but less accurate than cross encoders. Useful for semantic search, or Information retrieval due to the efficiency,
- Cross-Encoders: Process pairs of sentences together to produce a similarity score. They are slower, but more accurate than bi-encoders. Use For small pre-defined data sets, cross encoders can be useful.
It's common practice to combine bi-encoders and cross-encoders in a multi-stage approach. The bi-encoders will help produce the initial retrieval candidates. The cross encoders are used to re-rank top candidates for higher accuracy.
In Hybrid Mode, Jar3d creates a Neo4j knowledge graph from the ingested documents. We have found that combining KG retrieval with RAG improves it's ability to answer more complex queries.
We leverage an LLM Graph Transformer to create a knowledge graph from the documents.
We query the knowlede graph using Cypher Query Language to isolate the most relevant relationhips and nodes. This retireved context is fed into the LLM alongside the context retrievd from the similarity search and reranking with the cross-encoder.
LLM Sherpa - Open Source Covers document parsing
LLM Sherpa: Handles a variety of documents and provides some structural awareness. Allows self hosting via the backend service which is fully open sourced. LLM Sherpa does smart chunking which preserves the integrity of text by keeping related text together. Research suggests that it struggles with more complex PDF doc types. Platforms:
Langchain integrations make it easy to leverage a variety of embedding model services.
- For bi-encoders FastEmbed by Qdrant is fast and lightweight (good for PoCs, search apps).
- For re-ranking with Cross-Encoders, Langchain provides a variety of options.