Nimbus - A Document Mind 🧠

Intelligent Document Processing with RAG-Powered Conversations

Nimbus is your Document Mind - a sophisticated AI system that reads, understands, and converses about your documents using advanced RAG (Retrieval-Augmented Generation) technology. Transform any collection of documents into an intelligent knowledge base that you can chat with naturally. Built with Flask, PostgreSQL with pgvector, and Ollama for local LLM inference.

🌟 Why Choose Nimbus?

Nimbus acts as your Document Mind - it doesn't just store your documents, it truly understands them. Ask questions in natural language and get intelligent answers backed by your actual content, with full source citations.

🚀 Features

🤖 Advanced RAG System

Multi-Model Embeddings: Support for multiple embedding models simultaneously (nomic-embed-text, mxbai-embed-large, all-minilm)
Intelligent Retrieval: Query multiple embedding models and merge results with deduplication
Context-Aware Responses: LLM only answers from your documents, preventing hallucinations
Source Citations: Track which documents were used to generate each answer

📄 Powerful Document Processing

Multiple Parsers:
- PyMuPDF (fast, standard PDFs)
- PDFPlumber (tables and structured data)
- Unstructured (advanced layout detection)
- OCR Parser (scanned documents, images)
- OCR + Vision (AI-powered image description using LLaVA)
Smart Text Splitting:
- Recursive Character Splitter (balanced chunks)
- Token-based Splitter (LLM-optimized)
- Semantic Splitter (embedding-based, natural boundaries)

💬 Rich Chat Interface

Session Management: Organize conversations by topic
Persistent History: All chats saved to database
Multi-Model Support: Switch between different LLMs
Conversation History: Maintains context across messages
Sidebar Navigation: Quick access to all chat sessions

👥 User Management

Role-based access control (Admin/User)
Secure password hashing with bcrypt
User creation, deletion, and password management
Per-user document isolation

🎨 Modern UI/UX

Custom Nimbus branding with professional logos
Responsive Bootstrap 5 design
Dark/light theme support
Real-time status updates
Drag-and-drop file upload
Document preview functionality

🧠 How Nimbus Works as Your Document Mind

Nimbus transforms your documents into an intelligent, searchable knowledge base:

📄 Ingestion: Upload documents in various formats (PDF, DOCX, TXT, etc.)
🔍 Understanding: Advanced parsers extract text, including OCR for scanned documents
✂️ Chunking: Smart text splitting creates semantically meaningful pieces
🧭 Vectorization: Multiple embedding models create rich vector representations
💬 Conversation: Chat naturally - Nimbus retrieves relevant information and responds intelligently
📋 Citation: Every answer includes source references to maintain trust and accuracy

🏗️ Architecture

                    🧠 Nimbus Document Mind
                          ┌─────────────────┐
                          │   Web Browser   │
                          └────────┬────────┘
                                   │
                          ┌────────▼────────────────────────────────────┐
                          │         Flask Application (Document Mind)    │
                          │  ┌──────────┬──────────┬──────────────┐    │
                          │  │   Chat   │Documents │    Users     │    │
                          │  │ Blueprint│Blueprint │  Blueprint   │    │
                          │  └──────────┴──────────┴──────────────┘    │
                          └────────┬────────────────────────────────────┘
                                   │
                              ┌────┴────┐
                              │         │
                          ┌───▼──┐  ┌──▼──────────┐
                          │Ollama│  │ PostgreSQL  │
                          │ LLMs │  │  + pgvector │
                          └──────┘  └─────────────┘
                         💭 AI Mind   🧠 Memory Bank

Tech Stack:

Backend: Flask (Python)
Database: PostgreSQL 16 + pgvector extension
LLM/Embeddings: Ollama (local inference)
Document Processing: PyMuPDF, PDFPlumber, Tesseract OCR, Pillow
Text Splitting: LangChain, custom implementations
Frontend: Bootstrap 5, vanilla JavaScript
Containerization: Docker & Docker Compose

🚀 Quick Start

Prerequisites

Docker and Docker Compose installed
Python 3.12+ (if running without Docker)
Ollama installed and running with desired models

Installation

Clone the repository

git clone <your-repo-url>
cd nimbus

Choose your deployment method

Option A: Full Docker Deployment (Recommended for Production)

# Starts Nimbus app + PostgreSQL + Ollama
docker compose up -d

Option B: Development Setup (Local Python + Docker Services)

# Only starts PostgreSQL + Ollama, run Nimbus locally
docker compose -f docker-compose.dev.yml up -d

If using Option B (Development), set up Python environment

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Configure environment variables (Optional)

Create a .env file in the project root:

# Flask Configuration
FLASK_SECRET_KEY=your-secure-secret-key-here
FLASK_ENV=development
FLASK_DEBUG=true
APP_HOST=0.0.0.0
APP_PORT=8000

# Database
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/nimbus

# Ollama
OLLAMA_URL=http://localhost:11434

# Default Models
DEFAULT_EMBEDDING_MODEL=nomic-embed-text

Run the application (Development mode only)

# Only needed for Option B
python app.py

Access the application

Open your browser and navigate to:

http://localhost:8000

Default credentials:

Username: admin
Password: admin123

⚠️ Security Note: Change the default password immediately after first login!

� Screenshots

Login & Dashboard

Nimbus login screen with professional branding

Main dashboard showing the Document Mind interface

Document Management

Easy drag-and-drop document upload interface

View and manage your uploaded documents

Document Processing

Choose the appropriate parser for your document type

Generate embeddings with different models

Monitor document processing progress

Chat Interface

Natural language chat with your documents

📝 Note: For detailed usage instructions with step-by-step screenshots, see the Usage Guide.

�📖 Usage Guide

1. Upload Documents

Navigate to Documents page:

Click "Upload Document" or drag & drop files
Supported formats: PDF, TXT, MD, DOCX, PPTX
Files are associated with your user account

2. Parse Documents

Choose a parser based on your document type:

PyMuPDF: Best for standard PDFs with selectable text
PDFPlumber: Excellent for tables and structured data
Unstructured: Advanced layout analysis
OCR: For scanned documents or images
OCR + Vision: Combines text extraction with AI image description

3. Split Text

Select a splitting strategy:

Recursive: Balanced chunks with configurable size/overlap
Token-based: Optimized for LLM token limits
Semantic: Uses embeddings to find natural breakpoints

4. Generate Embeddings

Choose embedding models to create:

nomic-embed-text: Fast, efficient
mxbai-embed-large: High accuracy
all-minilm: Compact, good for large datasets

💡 Tip: Generate multiple embedding models for better retrieval!

5. Enable Documents

Toggle documents "enabled" to include them in RAG context

6. Chat with Your Documents

Go to Chat page:

Select a chat model (e.g., llama3.2, qwen2.5)
Start asking questions about your documents
The AI will retrieve relevant chunks and cite sources
Create multiple sessions to organize conversations

⚙️ Configuration

All configuration is centralized in config.py. Key settings:

RAG Configuration

RAG_TOP_K_PER_MODEL = 5      # Top chunks per embedding model
RAG_TOP_K_OVERALL = 10       # Total chunks to include in context
RAG_SNIPPET_MAX_CHARS = 800  # Max characters per snippet

Document Processing

DEFAULT_CHUNK_SIZE = 1000     # Characters per chunk
DEFAULT_CHUNK_OVERLAP = 200   # Overlap between chunks

Model Mapping

Define which embedding tables to query for each chat model:

MODEL_EMBEDDING_TABLE_MAP = {
    'llama3:latest': [
        {'table': 'document_embeddings_nomic_embed_text', 'embedding_model': 'nomic-embed-text'},
        {'table': 'document_embeddings_mxbai_embed_large', 'embedding_model': 'mxbai-embed-large'}
    ]
}

🐳 Docker Deployment

Full Stack Deployment:

# Complete deployment with all services
docker compose up -d

Development Setup:

# Only database and Ollama (run Nimbus locally)
docker compose -f docker-compose.dev.yml up -d
python app.py

What's Included:

🐘 PostgreSQL with pgvector - Vector database for embeddings
🤖 Ollama - Local LLM inference server
🧠 Nimbus App - Document Mind application (full deployment only)
🗄️ Persistent volumes - Data survives container restarts
🌐 Internal networking - Services communicate securely

Database initialization scripts in db/init/:

01_init.sql: Creates users table, pgvector extension, and default admin user
02_chat_tables.sql: Creates chat sessions and messages tables with triggers

See DEPLOYMENT.md for detailed deployment options and production setup.

🔧 Advanced Features

Multi-Model RAG

Nimbus queries multiple embedding models simultaneously and intelligently merges results:

Computes embeddings for user query with each configured model
Retrieves top-K chunks from each embedding table
Deduplicates based on content
Ranks by similarity score
Sends top results to LLM as context

OCR + Vision Pipeline

For image-heavy or scanned documents:

Converts PDF pages to images (300 DPI for OCR)
Extracts text using Tesseract OCR
Optionally uses LLaVA vision model to describe images
Combines textual and visual information

Semantic Text Splitting

Uses embeddings to find natural boundaries:

Calculates similarity between consecutive sentences
Splits at points where similarity drops (semantic shift)
Creates more coherent chunks than arbitrary character counts

📁 Project Structure

nimbus/
├── app.py                      # Main Flask application
├── config.py                   # Centralized configuration
├── requirements.txt            # Python dependencies
├── docker-compose.yml          # Docker setup
│
├── apps/                       # Modular blueprints
│   ├── chat/                   # Chat interface & RAG logic
│   ├── documents/              # Document management
│   │   ├── parsers/           # PDF/OCR parsers
│   │   └── splitters/         # Text splitting strategies
│   └── users/                  # User management
│
├── db/                         # Database
│   ├── init/                  # SQL initialization scripts
│   └── chat.db                # SQLite (if using)
│
├── templates/                  # HTML templates
├── static/                     # CSS, JS, images
└── uploads/                    # User-uploaded files

🤝 Contributing

Contributions are welcome! Here's how you can help:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Please ensure your code follows the existing style and includes appropriate documentation.

🛣️ Roadmap & Future Improvements

🔍 Advanced Retrieval & Performance

Hybrid Retrieval + Re-ranker: Combine dense and sparse retrieval with sophisticated re-ranking algorithms
ANN Indexing for pgvector: Implement HNSW/IVFFlat indexing for faster similarity search at scale
Semantic Caching & Query Expansion: Cache embeddings and expand queries for better retrieval coverage

🔒 Security & Infrastructure

Security Hardening: Move beyond default credentials with OAuth, RBAC, API keys, and audit logging
Async Ingestion Pipeline: Background processing with job queues for large document batches
Advanced Analytics Dashboard: Usage metrics, performance monitoring, and system insights

📄 Document Processing & Types

More File Types + Table Extraction: Excel, CSV, HTML, PowerPoint with advanced table parsing
Multi-language Support: International document processing and multilingual embeddings
Batch Document Processing: Efficient handling of large document collections

🌐 API & Integration

API Endpoints: RESTful API for programmatic access and third-party integrations
Export Chat Conversations: Export functionality for conversations and knowledge artifacts
Docker Image: Complete containerized application for easy deployment

📊 Enterprise Features

Advanced User Management: Organizations, teams, and granular permissions
Document Versioning: Track changes and maintain document history
Audit Trails: Complete logging for compliance and monitoring
Custom Model Integration: Support for private/custom LLM and embedding models

� Security

⚠️ Important Security Notice: This application includes default credentials for development. Change all default passwords immediately before production deployment!

Default admin login: admin / admin123
See SECURITY.md for complete security guidelines
Follow the security checklist before going live

For security issues, please report responsibly to maintainers directly.

�📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Ollama for local LLM inference
pgvector for PostgreSQL vector extension
LangChain for semantic splitting utilities
Bootstrap for UI components
Tesseract OCR for text extraction

📧 Support

For questions, issues, or feature requests:

Open an issue on GitLab
Check existing documentation in the /docs folder
Review the configuration guide in CONFIGURATION_GUIDE.md

Built with ❤️ to be your intelligent Document Mind - transforming how you interact with knowledge 🧠📄

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
apps		apps
db/init		db/init
docs		docs
static		static
templates		templates
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
app.py		app.py
config.py		config.py
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

girishf15/nimbus

Folders and files

Latest commit

History

Repository files navigation

Nimbus - A Document Mind 🧠

🌟 Why Choose Nimbus?

🚀 Features

🤖 Advanced RAG System

📄 Powerful Document Processing

💬 Rich Chat Interface

👥 User Management

🎨 Modern UI/UX

🧠 How Nimbus Works as Your Document Mind

🏗️ Architecture

🚀 Quick Start

Prerequisites

Installation

� Screenshots

Login & Dashboard

Document Management

Document Processing

Chat Interface

�📖 Usage Guide

1. Upload Documents

2. Parse Documents

3. Split Text

4. Generate Embeddings

5. Enable Documents

6. Chat with Your Documents

⚙️ Configuration

RAG Configuration

Document Processing

Model Mapping

🐳 Docker Deployment

🔧 Advanced Features

Multi-Model RAG

OCR + Vision Pipeline

Semantic Text Splitting

📁 Project Structure

🤝 Contributing

🛣️ Roadmap & Future Improvements

🔍 Advanced Retrieval & Performance

🔒 Security & Infrastructure

📄 Document Processing & Types

🌐 API & Integration

📊 Enterprise Features

� Security

�📝 License

🙏 Acknowledgments

📧 Support

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages