Skip to content

carnotresearch/new-qdoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

icarKno - Document Intelligence Platform

An AI-powered document analysis platform that enables intelligent querying, summarization, and knowledge extraction from various document formats.

Python Flask License

✨ Features

πŸ“„ Document Processing

  • Multi-format Support: PDF, DOCX, DOC, TXT, PPTX, CSV, XLSX
  • Intelligent Chunking: Hierarchical text segmentation for optimal retrieval
  • Vector Embeddings: Advanced semantic search using OpenAI embeddings

πŸ” Smart Search & Retrieval

  • Hybrid Search: Combines keyword and vector search for best results
  • Contextual Answers: AI-powered responses with source citations
  • Creative Reasoning: Advanced multi-step analysis for complex queries

🧠 AI Capabilities

  • Document Summarization: Extractive and abstractive summaries
  • Question Answering: Natural language queries with source attribution
  • Knowledge Graphs: Visual representation of document relationships
  • Multi-language: Support for 23+ languages including Hindi, Tamil, Bengali

πŸ“Š Data Integration

  • Structured Data: Query CSV/Excel files using natural language
  • SQL Generation: Automatic conversion of questions to database queries
  • Hybrid Analysis: Combine document insights with data analytics

🌐 Integration & Access

  • REST API: Complete programmatic access
  • WhatsApp Bot: Chat with your documents via WhatsApp
  • Web Interface: User-friendly document upload and query interface
  • Authentication: JWT-based secure access with subscription tiers

πŸš€ Quick Start

Prerequisites

  • Python 3.8 or higher
  • Elasticsearch cluster (cloud or self-hosted)
  • OpenAI API access
  • MySQL database
  • MongoDB instance

Installation

  1. Clone the repository

    git clone https://github.com/your-org/icarkno.git
    cd icarkno
  2. Create virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Environment setup

    cp .env.example .env
    # Edit .env with your configuration
  5. Configure environment variables

    # Elasticsearch
    ES_CLOUD_ID=your-elasticsearch-cloud-id
    ES_API_KEY=your-elasticsearch-api-key
    
    # OpenAI
    OPENAI_API_KEY=your-openai-api-key
    
    # Database
    MONGO_URL=mongodb://localhost:27017/icarkno
    MYSQL_HOST=localhost
    MYSQL_USERNAME=your-username
    MYSQL_PASSWORD=your-password
    
    # Security
    JWT_SECRET_KEY=your-jwt-secret
    SECRET_KEY=your-flask-secret
    
    # Neo4j (for knowledge graphs)
    NEO4J_URI=bolt://localhost:7687
    NEO4J_USERNAME=neo4j
    NEO4J_PASSWORD=your-password
  6. Run the application

    python run.py

The API will be available at http://localhost:5000

πŸ“– Usage

Basic Document Processing

# Upload documents
curl -X POST http://localhost:5000/upload \
  -F "[email protected]" \
  -F "token=your-jwt-token" \
  -F "sessionId=session123"

# Query documents
curl -X POST http://localhost:5000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "token": "your-jwt-token",
    "sessionId": "session123",
    "message": "What are the key findings?",
    "context": true
  }'

Free Trial Mode

# Upload for trial users
curl -X POST http://localhost:5000/freeTrial \
  -F "[email protected]" \
  -F "fingerprint=unique-browser-fingerprint"

# Query in trial mode
curl -X POST http://localhost:5000/trialAsk \
  -H "Content-Type: application/json" \
  -d '{
    "fingerprint": "unique-browser-fingerprint",
    "message": "Summarize this document"
  }'

Advanced Features

  • Creative Mode: Add "mode": "creative" for multi-step reasoning
  • Knowledge Graphs: Use /create_graph endpoint for visual relationships
  • Multi-language: Set "outputLanguage": 1 for Hindi responses
  • Data Queries: Upload CSV/Excel and query with natural language

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client Apps   β”‚    β”‚   Flask API      β”‚    β”‚   AI Services   β”‚
β”‚                 │────│                  │────│                 β”‚
β”‚ β€’ Web Interface β”‚    β”‚ β€’ Authentication β”‚    β”‚ β€’ OpenAI GPT    β”‚
β”‚ β€’ WhatsApp Bot  β”‚    β”‚ β€’ Rate Limiting  β”‚    β”‚ β€’ Embeddings    β”‚
β”‚ β€’ Mobile App    β”‚    β”‚ β€’ File Processingβ”‚    β”‚ β€’ Summarization β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚   Data Layer    β”‚
                       β”‚                 β”‚
                       β”‚ β€’ Elasticsearch β”‚
                       β”‚ β€’ MongoDB       β”‚
                       β”‚ β€’ MySQL         β”‚
                       β”‚ β€’ Neo4j         β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

icarkno/
β”œβ”€β”€ app/                    # Flask application
β”‚   β”œβ”€β”€ __init__.py        # App factory
β”‚   β”œβ”€β”€ api/               # API blueprints
β”‚   β”œβ”€β”€ services/          # Business logic
β”‚   └── config.py          # Configuration
β”œβ”€β”€ controllers/           # Legacy controllers
β”œβ”€β”€ elastic/              # Elasticsearch integration
β”œβ”€β”€ utils/                # Utilities
β”œβ”€β”€ webhook/              # WhatsApp integration
β”œβ”€β”€ requirements.txt      # Dependencies
β”œβ”€β”€ run.py               # Application entry point
└── README.md            # This file

πŸ”§ Configuration

Environment Variables

Variable Description Required
ES_CLOUD_ID Elasticsearch Cloud ID Yes
ES_API_KEY Elasticsearch API Key Yes
OPENAI_API_KEY OpenAI API Key Yes
MONGO_URL MongoDB connection string Yes
MYSQL_HOST MySQL host Yes
JWT_SECRET_KEY JWT signing key Yes
NEO4J_URI Neo4j connection URI Optional

Supported File Formats

  • Documents: PDF, DOCX, DOC, TXT, PPTX
  • Data: CSV, XLSX, XLS
  • Max file size: 50MB per file
  • Supported languages: 23+ languages

πŸ”Œ API Reference

See API Documentation for detailed endpoint information.

Key Endpoints

  • POST /upload - Upload documents
  • POST /ask - Query documents
  • POST /freeTrial - Trial mode upload
  • POST /trialAsk - Trial mode queries
  • POST /updatepayment - Manage subscriptions
  • GET /healthcheck - Health status

πŸ§ͺ Testing

Unit Tests

python -m pytest tests/

Integration Tests

# Test document processing
python upload_to_elastic.py --file test.pdf --index test_index

# Test querying
python query_elastic.py --index test_index --query "test question"

πŸš€ Deployment

Docker (Recommended)

docker build -t icarkno .
docker run -p 5000:5000 --env-file .env icarkno

Production Setup

# Using Gunicorn
gunicorn --bind 0.0.0.0:5000 --workers 4 run:app

# With SSL
gunicorn --bind 0.0.0.0:443 --certfile cert.pem --keyfile key.pem run:app

πŸ“Š Performance

  • Processing Speed: ~2-5 seconds per page
  • Concurrent Users: Supports 100+ simultaneous users
  • Storage: Elasticsearch scales horizontally
  • Languages: 23+ supported with translation API
  • Rate Limits: Configurable per user tier

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“ License

This project is proprietary software. See LICENSE for details.

πŸ†˜ Support

πŸ™ Acknowledgments


Made with ❀️ by Carnot Research

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages