RAG-ecosystem

A complete Retrieval-Augmented Generation (RAG) system, plus an OCR-to-Markdown & Image Q&A module, built with Streamlit and Haystack.

Features

Agentic RAG Chat
- Hybrid retrieval (OpenAI embeddings + BM25)
- Contextual query routing & summarization
- Stateful chat sessions saved to SQLite
BM25-Only Document Search
- Quick keyword-driven document lookup
OCR to Markdown Converter
- Uses Together AI vision models to extract full-page content as Markdown
Image Question-Answering
- Ask natural-language questions about any uploaded or URL’d image

Getting Started

Prerequisites

Python 3.8+
An OpenAI API key
A Together AI API key (for OCR & Image Q&A)
Git & your favorite terminal/shell

Installation

Clone the repo

git clone https://github.com/Esmail-ibraheem/RAG-ecosystem.git
cd RAG-ecosystem

Create & activate a virtual environment

python3 -m venv .venv
source .venv/bin/activate    # macOS/Linux
.venv\Scripts\activate       # Windows

Install dependencies

pip install streamlit
pip install "haystack[all]"   # Core RAG components
pip install openai sqlalchemy pandas python-docx pillow requests python-dotenv

Configuration

Create a .env file in the project root with your API keys:

OPENAI_API_KEY=sk-…
TOGETHER_API_KEY=sk-…

Or set them in your shell:

export OPENAI_API_KEY=sk-…
export TOGETHER_API_KEY=sk-…

Usage

1. Agentic RAG Chat & BM25 Search

Launch the main RAG app:

streamlit run RAG.py

Sidebar
- Enter your OpenAI API key.
- Pick a GPT model (e.g. gpt-3.5-turbo or gpt-4-turbo).
- Upload documents (.pdf, .docx, .txt, .csv, .xlsx) for RAG or BM25.
- Start or load chat sessions.
Main panel
- For “RAG Chat”: chat with the system—behind the scenes it chooses between summary, context-driven answer, or simple reply.
- For “BM25 Search”: run keyword searches and preview top‐k results.

All chat history is stored in chat_history.db (SQLite) for later reuse.

2. OCR to Markdown & Image Q&A

Run the OCR & QA utility:

streamlit run ocr_processor.py

Convert to Markdown
- Upload or URL-point to an image.
- Click Convert to Markdown to get full‐page Markdown.
Ask About the Image
- Enter a natural-language question.
- Click Get Answer to see the model’s response.

Project Structure

.
├── RAG.py                 # Agentic RAG & BM25 Search Streamlit app
├── ocr_processor.py       # OCR → Markdown & Image Q&A Streamlit app
├── utils/
│   └── custom_converters.py  # Docx/.xlsx → Haystack Document converters
├── chat_history.db        # SQLite DB (auto-generated)
├── LICENSE                # MIT License
└── .gitignore

Contributing

Fork the repo
Create a feature branch (git checkout -b feat/my-feature)
Commit your changes (git commit -m "Add …")
Push to your branch (git push)
Open a Pull Request!

Please follow the existing code style and include tests/examples where applicable.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG-ecosystem

Table of Contents

Features

Getting Started

Prerequisites

Installation

Configuration

Usage

1. Agentic RAG Chat & BM25 Search

2. OCR to Markdown & Image Q&A

Project Structure

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
RAG.py		RAG.py
README.md		README.md
ocr_processor.py		ocr_processor.py

License

YemenOpenSource/RAG-ecosystem

Folders and files

Latest commit

History

Repository files navigation

RAG-ecosystem

Table of Contents

Features

Getting Started

Prerequisites

Installation

Configuration

Usage

1. Agentic RAG Chat & BM25 Search

2. OCR to Markdown & Image Q&A

Project Structure

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages