Welcome to the Personal Knowledge Assistant project! This guide will walk you through setting up both the frontend and backend components of the system, a RAG-based platform for querying books and personal knowledge.
- Document Loading: Processes PDF documents using PyPDFLoader
- Text Chunking: Splits documents into manageable chunks using RecursiveCharacterTextSplitter
- Embedding Generation: Converts chunks into vector representations using HuggingFaceEmbeddings
- Vector Storage: Stores embeddings in a FAISS vector store for efficient retrieval
- Query Rewriting: Rewrites the original query to be more effective for retrieval
- Base Retrieval: Retrieves initial set of relevant documents from the vector store
- Contextual Compression: Applies filtering and extraction to improve retrieval quality
- Document Evaluation: Evaluates each retrieved document for relevance and reliability
- Score Calculation: Combines relevance and reliability into a confidence score
- Confidence Routing: Routes the query to different processing paths based on confidence:
- High Confidence (>0.7): Uses direct knowledge refinement
- Medium Confidence (0.3-0.7): Uses hybrid approach
- Low Confidence (<0.3): Falls back to web search
- Knowledge Strip Decomposition: Breaks documents into individual "knowledge strips"
- Strip Relevance Scoring: Scores each strip's relevance to the query
- Strip Filtering: Filters strips based on relevance threshold
- Search Query Generation: Creates optimized search queries
- DuckDuckGo Search: Performs web search using DuckDuckGo
- Result Processing: Extracts and processes relevant information from search results
- Prompt Template: Assembles a prompt with context, confidence level, and query
- Conversation Memory: Maintains chat history for contextual responses
- LLM Generation: Generates final response using Groq LLM (Mistral model)
- Response Formatting: Formats response based on confidence level with appropriate caveats
- Confidence-Based Routing: Intelligently routes queries based on document relevance
- Knowledge Strip Decomposition: Extracts and filters relevant information pieces
- Dynamic Web Search Fallback: Uses web search when document knowledge is insufficient
- Document Evaluation: Explicitly evaluates document relevance and reliability
- Contextual Compression: Uses embeddings filtering and LLM extraction to improve retrieval quality
Before starting, ensure you have the following tools installed:
- Python 3.9+ for the backend
- Node.js 18+ for the frontend
- Git (optional)
- PDF books you want to include in your knowledge base
mkdir personal-knowledge-assistant
cd personal-knowledge-assistant
mkdir backend frontend
Organize your project directory as follows:
backend/
├── app/
│ ├── main.py
│ ├── api/
│ │ ├── __init__.py
│ │ ├── routes/
│ │ │ ├── __init__.py
│ │ │ └── chat.py
│ │
│ ├── core/
│ │ ├── __init__.py
│ │ ├── config.py
│ │ └── security.py
│ │
│ ├── db/
│ │ ├── __init__.py
│ │ └── vector_store.py
│ │
│ ├── models/
│ │ ├── __init__.py
│ │ └── schemas.py
│ │
│ ├── services/
│ │ ├── __init__.py
│ │ ├── rag.py
│ │ └── llm.py
│ │
│ └── utils/
│ ├── __init__.py
│ └── text_processing.py
│
├── data/
│ └── embeddings/
│
├── ingest.py
├── requirements.txt
└── .env
cd backend
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Create requirements.txt with the following content
Add the dependencies to your requirements.txt
:
fastapi
uvicorn
pydantic
pydantic-settings
langchain
langchain-groq
langchain-community
langchain-huggingface
faiss-cpu
python-dotenv
pypdf
sentence-transformers
Then install the dependencies:
pip install -r requirements.txt
Create a .env
file in the backend directory:
GROQ_API_KEY=your_groq_api_key_here
touch app/__init__.py
touch app/api/__init__.py
touch app/api/routes/__init__.py
touch app/core/__init__.py
touch app/db/__init__.py
touch app/models/__init__.py
touch app/services/__init__.py
touch app/utils/__init__.py
Place your PDF books in a directory and ingest them:
mkdir books
# Copy your PDF books into the books directory
python ingest.py --dir books
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
cd ../frontend
npx create-next-app@latest .
# Select Yes for TypeScript
# Select Yes for ESLint
# Select Yes for Tailwind CSS
# Select Yes for src/ directory
# Select Yes for App Router
# Select Yes for import alias
npm install lucide-react react-markdown
npx shadcn-ui@latest init
# Select Default for style
# Select Default for baseColor
# Select Yes for CSS variables
# Use App dir structure
# Select src/components for components directory
# Select @/components for import alias
# Select Yes for React Server Components
# Select Yes for tailwind.config.ts
# Select @/lib/utils for utils
# Install the required components
npx shadcn-ui@latest add button textarea card
Create a .env.local
file in the frontend directory:
NEXT_PUBLIC_API_URL=http://localhost:8000/api
Replace the contents of the following files with the provided code:
src/app/page.tsx
src/app/layout.tsx
src/app/globals.css
tailwind.config.ts
npm run dev
Your application should now be running at http://localhost:3000
.
- Navigate to
http://localhost:3000
in your web browser. - Ask questions about the books you've ingested.
- The application will search through the book content and provide relevant answers.
If you encounter issues with the vector store:
rm -rf data/vector_store
python ingest.py --dir books
If the frontend can't connect to the backend:
- Ensure the backend is running on port
8000
. - Check that CORS is properly configured.
- Verify your
.env.local
file has the correct API URL.
If you encounter authentication errors:
- Double-check your Groq API key in the
.env
file. - Ensure your HuggingFace token has the necessary permissions.
To change the LLM model, edit app/core/config.py
:
LLM_MODEL: str = "your-preferred-model" # e.g., "llama3-8b-8192" for a smaller model
Edit app/core/config.py
to customize the RAG behavior:
CHUNK_SIZE: int = 1000 # Increase for larger contexts
CHUNK_OVERLAP: int = 200 # Adjust to reduce information loss at chunk boundaries
TOP_K_RESULTS: int = 5 # Increase for more comprehensive context
Edit app/core/config.py
to use a different embedding model:
EMBEDDING_MODEL: str = "your-preferred-embedding-model" # e.g., "sentence-transformers/all-mpnet-base-v2"