This project allows you to load a PDF file, split it into manageable chunks, create embeddings using Sentence Transformers, store them in ChromaDB, and ask questions based on the content.
- Python 🐍
langchain
langchain-chroma
sentence-transformers
chroma
transformers==4.52.4
formers==4.1.0
torch==2.7.1
Clone the repository:
git clone https://github.com/solanki505/RAG-Vector-embeddings.git
cd RAG-Vector-embeddings
Install dependencies:
pip install -r requirements.txt
Or manually:
pip install langchain langchain-chroma sentence-transformers chromadb transformers==4.52.4 torch==2.7.1 formers==4.1.0
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
loader = PyPDFLoader("paul-graham-ideas.pdf")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=30)
docs = text_splitter.split_documents(documents)
embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
db = Chroma.from_documents(docs, embedding_function)
query = "This essay is derived from where?"
docs = db.similarity_search(query)
print(docs[0].page_content)
- Query academic or research PDFs
- Build educational or legal Q&A bots
- Extract insights from any long document
RAG-Vector-embeddings/
│
├── paul-graham-ideas.pdf
├── main.py
├── requirements.txt
└── README.md
Made with ❤️ by Solanki Sarkar