Name	Name	Last commit message	Last commit date
Latest commit History 18 Commits
app	app
controllers	controllers
elastic	elastic
utils	utils
webhook	webhook
.gitignore	.gitignore
api-documentation.md	api-documentation.md
config.py	config.py
readme.md	readme.md
requirements.txt	requirements.txt
run.py	run.py

icarKno - Document Intelligence Platform

An AI-powered document analysis platform that enables intelligent querying, summarization, and knowledge extraction from various document formats.

✨ Features

📄 Document Processing

Multi-format Support: PDF, DOCX, DOC, TXT, PPTX, CSV, XLSX
Intelligent Chunking: Hierarchical text segmentation for optimal retrieval
Vector Embeddings: Advanced semantic search using OpenAI embeddings

🔍 Smart Search & Retrieval

Hybrid Search: Combines keyword and vector search for best results
Contextual Answers: AI-powered responses with source citations
Creative Reasoning: Advanced multi-step analysis for complex queries

🧠 AI Capabilities

Document Summarization: Extractive and abstractive summaries
Question Answering: Natural language queries with source attribution
Knowledge Graphs: Visual representation of document relationships
Multi-language: Support for 23+ languages including Hindi, Tamil, Bengali

📊 Data Integration

Structured Data: Query CSV/Excel files using natural language
SQL Generation: Automatic conversion of questions to database queries
Hybrid Analysis: Combine document insights with data analytics

🌐 Integration & Access

REST API: Complete programmatic access
WhatsApp Bot: Chat with your documents via WhatsApp
Web Interface: User-friendly document upload and query interface
Authentication: JWT-based secure access with subscription tiers

🚀 Quick Start

Prerequisites

Python 3.8 or higher
Elasticsearch cluster (cloud or self-hosted)
OpenAI API access
MySQL database
MongoDB instance

Installation

Clone the repository

git clone https://github.com/your-org/icarkno.git
cd icarkno

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```

Environment setup

cp .env.example .env
# Edit .env with your configuration

Configure environment variables

# Elasticsearch
ES_CLOUD_ID=your-elasticsearch-cloud-id
ES_API_KEY=your-elasticsearch-api-key

# OpenAI
OPENAI_API_KEY=your-openai-api-key

# Database
MONGO_URL=mongodb://localhost:27017/icarkno
MYSQL_HOST=localhost
MYSQL_USERNAME=your-username
MYSQL_PASSWORD=your-password

# Security
JWT_SECRET_KEY=your-jwt-secret
SECRET_KEY=your-flask-secret

# Neo4j (for knowledge graphs)
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-password

Run the application
```
python run.py
```

The API will be available at http://localhost:5000

📖 Usage

Basic Document Processing

# Upload documents
curl -X POST http://localhost:5000/upload \
  -F "[email protected]" \
  -F "token=your-jwt-token" \
  -F "sessionId=session123"

# Query documents
curl -X POST http://localhost:5000/ask \
  -H "Content-Type: application/json" \
  -d '{
    "token": "your-jwt-token",
    "sessionId": "session123",
    "message": "What are the key findings?",
    "context": true
  }'

Free Trial Mode

# Upload for trial users
curl -X POST http://localhost:5000/freeTrial \
  -F "[email protected]" \
  -F "fingerprint=unique-browser-fingerprint"

# Query in trial mode
curl -X POST http://localhost:5000/trialAsk \
  -H "Content-Type: application/json" \
  -d '{
    "fingerprint": "unique-browser-fingerprint",
    "message": "Summarize this document"
  }'

Advanced Features

Creative Mode: Add "mode": "creative" for multi-step reasoning
Knowledge Graphs: Use /create_graph endpoint for visual relationships
Multi-language: Set "outputLanguage": 1 for Hindi responses
Data Queries: Upload CSV/Excel and query with natural language

🏗️ Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Client Apps   │    │   Flask API      │    │   AI Services   │
│                 │────│                  │────│                 │
│ • Web Interface │    │ • Authentication │    │ • OpenAI GPT    │
│ • WhatsApp Bot  │    │ • Rate Limiting  │    │ • Embeddings    │
│ • Mobile App    │    │ • File Processing│    │ • Summarization │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                                │
                       ┌────────┴────────┐
                       │   Data Layer    │
                       │                 │
                       │ • Elasticsearch │
                       │ • MongoDB       │
                       │ • MySQL         │
                       │ • Neo4j         │
                       └─────────────────┘

📁 Project Structure

icarkno/
├── app/                    # Flask application
│   ├── __init__.py        # App factory
│   ├── api/               # API blueprints
│   ├── services/          # Business logic
│   └── config.py          # Configuration
├── controllers/           # Legacy controllers
├── elastic/              # Elasticsearch integration
├── utils/                # Utilities
├── webhook/              # WhatsApp integration
├── requirements.txt      # Dependencies
├── run.py               # Application entry point
└── README.md            # This file

🔧 Configuration

Environment Variables

Variable	Description	Required
`ES_CLOUD_ID`	Elasticsearch Cloud ID	Yes
`ES_API_KEY`	Elasticsearch API Key	Yes
`OPENAI_API_KEY`	OpenAI API Key	Yes
`MONGO_URL`	MongoDB connection string	Yes
`MYSQL_HOST`	MySQL host	Yes
`JWT_SECRET_KEY`	JWT signing key	Yes
`NEO4J_URI`	Neo4j connection URI	Optional

Supported File Formats

Documents: PDF, DOCX, DOC, TXT, PPTX
Data: CSV, XLSX, XLS
Max file size: 50MB per file
Supported languages: 23+ languages

🔌 API Reference

See API Documentation for detailed endpoint information.

Key Endpoints

POST /upload - Upload documents
POST /ask - Query documents
POST /freeTrial - Trial mode upload
POST /trialAsk - Trial mode queries
POST /updatepayment - Manage subscriptions
GET /healthcheck - Health status

🧪 Testing

Unit Tests

python -m pytest tests/

Integration Tests

# Test document processing
python upload_to_elastic.py --file test.pdf --index test_index

# Test querying
python query_elastic.py --index test_index --query "test question"

🚀 Deployment

Docker (Recommended)

docker build -t icarkno .
docker run -p 5000:5000 --env-file .env icarkno

Production Setup

# Using Gunicorn
gunicorn --bind 0.0.0.0:5000 --workers 4 run:app

# With SSL
gunicorn --bind 0.0.0.0:443 --certfile cert.pem --keyfile key.pem run:app

📊 Performance

Processing Speed: ~2-5 seconds per page
Concurrent Users: Supports 100+ simultaneous users
Storage: Elasticsearch scales horizontally
Languages: 23+ supported with translation API
Rate Limits: Configurable per user tier

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit changes (git commit -m 'Add amazing feature')
Push to branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is proprietary software. See LICENSE for details.

🆘 Support

Documentation: API Docs
Issues: GitHub Issues
Email: [email protected]

🙏 Acknowledgments

OpenAI for GPT and embedding models
Elasticsearch for search and analytics
LangChain for LLM orchestration
Flask for the web framework

Made with ❤️ by Carnot Research

carnotresearch/new-qdoc

Folders and files

Latest commit

History

Repository files navigation