A complete Retrieval-Augmented Generation (RAG) system, plus an OCR-to-Markdown & Image Q&A module, built with Streamlit and Haystack.
-
Agentic RAG Chat
- Hybrid retrieval (OpenAI embeddings + BM25)
- Contextual query routing & summarization
- Stateful chat sessions saved to SQLite
-
BM25-Only Document Search
- Quick keyword-driven document lookup
-
OCR to Markdown Converter
- Uses Together AI vision models to extract full-page content as Markdown
-
Image Question-Answering
- Ask natural-language questions about any uploaded or URL’d image
- Python 3.8+
- An OpenAI API key
- A Together AI API key (for OCR & Image Q&A)
- Git & your favorite terminal/shell
-
Clone the repo
git clone https://github.com/Esmail-ibraheem/RAG-ecosystem.git cd RAG-ecosystem
-
Create & activate a virtual environment
python3 -m venv .venv source .venv/bin/activate # macOS/Linux .venv\Scripts\activate # Windows
-
Install dependencies
pip install streamlit pip install "haystack[all]" # Core RAG components pip install openai sqlalchemy pandas python-docx pillow requests python-dotenv
Create a .env
file in the project root with your API keys:
OPENAI_API_KEY=sk-…
TOGETHER_API_KEY=sk-…
Or set them in your shell:
export OPENAI_API_KEY=sk-…
export TOGETHER_API_KEY=sk-…
Launch the main RAG app:
streamlit run RAG.py
-
Sidebar
- Enter your OpenAI API key.
- Pick a GPT model (e.g.
gpt-3.5-turbo
orgpt-4-turbo
). - Upload documents (
.pdf
,.docx
,.txt
,.csv
,.xlsx
) for RAG or BM25. - Start or load chat sessions.
-
Main panel
- For “RAG Chat”: chat with the system—behind the scenes it chooses between summary, context-driven answer, or simple reply.
- For “BM25 Search”: run keyword searches and preview top‐k results.
All chat history is stored in chat_history.db
(SQLite) for later reuse.
Run the OCR & QA utility:
streamlit run ocr_processor.py
-
Convert to Markdown
- Upload or URL-point to an image.
- Click Convert to Markdown to get full‐page Markdown.
-
Ask About the Image
- Enter a natural-language question.
- Click Get Answer to see the model’s response.
.
├── RAG.py # Agentic RAG & BM25 Search Streamlit app
├── ocr_processor.py # OCR → Markdown & Image Q&A Streamlit app
├── utils/
│ └── custom_converters.py # Docx/.xlsx → Haystack Document converters
├── chat_history.db # SQLite DB (auto-generated)
├── LICENSE # MIT License
└── .gitignore
- Fork the repo
- Create a feature branch (
git checkout -b feat/my-feature
) - Commit your changes (
git commit -m "Add …"
) - Push to your branch (
git push
) - Open a Pull Request!
Please follow the existing code style and include tests/examples where applicable.
This project is licensed under the MIT License. See the LICENSE file for details.