An AI-powered document analysis platform that enables intelligent querying, summarization, and knowledge extraction from various document formats.
- Multi-format Support: PDF, DOCX, DOC, TXT, PPTX, CSV, XLSX
- Intelligent Chunking: Hierarchical text segmentation for optimal retrieval
- Vector Embeddings: Advanced semantic search using OpenAI embeddings
- Hybrid Search: Combines keyword and vector search for best results
- Contextual Answers: AI-powered responses with source citations
- Creative Reasoning: Advanced multi-step analysis for complex queries
- Document Summarization: Extractive and abstractive summaries
- Question Answering: Natural language queries with source attribution
- Knowledge Graphs: Visual representation of document relationships
- Multi-language: Support for 23+ languages including Hindi, Tamil, Bengali
- Structured Data: Query CSV/Excel files using natural language
- SQL Generation: Automatic conversion of questions to database queries
- Hybrid Analysis: Combine document insights with data analytics
- REST API: Complete programmatic access
- WhatsApp Bot: Chat with your documents via WhatsApp
- Web Interface: User-friendly document upload and query interface
- Authentication: JWT-based secure access with subscription tiers
- Python 3.8 or higher
- Elasticsearch cluster (cloud or self-hosted)
- OpenAI API access
- MySQL database
- MongoDB instance
-
Clone the repository
git clone https://github.com/your-org/icarkno.git cd icarkno
-
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Environment setup
cp .env.example .env # Edit .env with your configuration
-
Configure environment variables
# Elasticsearch ES_CLOUD_ID=your-elasticsearch-cloud-id ES_API_KEY=your-elasticsearch-api-key # OpenAI OPENAI_API_KEY=your-openai-api-key # Database MONGO_URL=mongodb://localhost:27017/icarkno MYSQL_HOST=localhost MYSQL_USERNAME=your-username MYSQL_PASSWORD=your-password # Security JWT_SECRET_KEY=your-jwt-secret SECRET_KEY=your-flask-secret # Neo4j (for knowledge graphs) NEO4J_URI=bolt://localhost:7687 NEO4J_USERNAME=neo4j NEO4J_PASSWORD=your-password
-
Run the application
python run.py
The API will be available at http://localhost:5000
# Upload documents
curl -X POST http://localhost:5000/upload \
-F "[email protected]" \
-F "token=your-jwt-token" \
-F "sessionId=session123"
# Query documents
curl -X POST http://localhost:5000/ask \
-H "Content-Type: application/json" \
-d '{
"token": "your-jwt-token",
"sessionId": "session123",
"message": "What are the key findings?",
"context": true
}'
# Upload for trial users
curl -X POST http://localhost:5000/freeTrial \
-F "[email protected]" \
-F "fingerprint=unique-browser-fingerprint"
# Query in trial mode
curl -X POST http://localhost:5000/trialAsk \
-H "Content-Type: application/json" \
-d '{
"fingerprint": "unique-browser-fingerprint",
"message": "Summarize this document"
}'
- Creative Mode: Add
"mode": "creative"
for multi-step reasoning - Knowledge Graphs: Use
/create_graph
endpoint for visual relationships - Multi-language: Set
"outputLanguage": 1
for Hindi responses - Data Queries: Upload CSV/Excel and query with natural language
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Client Apps β β Flask API β β AI Services β
β ββββββ ββββββ β
β β’ Web Interface β β β’ Authentication β β β’ OpenAI GPT β
β β’ WhatsApp Bot β β β’ Rate Limiting β β β’ Embeddings β
β β’ Mobile App β β β’ File Processingβ β β’ Summarization β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
ββββββββββ΄βββββββββ
β Data Layer β
β β
β β’ Elasticsearch β
β β’ MongoDB β
β β’ MySQL β
β β’ Neo4j β
βββββββββββββββββββ
icarkno/
βββ app/ # Flask application
β βββ __init__.py # App factory
β βββ api/ # API blueprints
β βββ services/ # Business logic
β βββ config.py # Configuration
βββ controllers/ # Legacy controllers
βββ elastic/ # Elasticsearch integration
βββ utils/ # Utilities
βββ webhook/ # WhatsApp integration
βββ requirements.txt # Dependencies
βββ run.py # Application entry point
βββ README.md # This file
Variable | Description | Required |
---|---|---|
ES_CLOUD_ID |
Elasticsearch Cloud ID | Yes |
ES_API_KEY |
Elasticsearch API Key | Yes |
OPENAI_API_KEY |
OpenAI API Key | Yes |
MONGO_URL |
MongoDB connection string | Yes |
MYSQL_HOST |
MySQL host | Yes |
JWT_SECRET_KEY |
JWT signing key | Yes |
NEO4J_URI |
Neo4j connection URI | Optional |
- Documents: PDF, DOCX, DOC, TXT, PPTX
- Data: CSV, XLSX, XLS
- Max file size: 50MB per file
- Supported languages: 23+ languages
See API Documentation for detailed endpoint information.
POST /upload
- Upload documentsPOST /ask
- Query documentsPOST /freeTrial
- Trial mode uploadPOST /trialAsk
- Trial mode queriesPOST /updatepayment
- Manage subscriptionsGET /healthcheck
- Health status
python -m pytest tests/
# Test document processing
python upload_to_elastic.py --file test.pdf --index test_index
# Test querying
python query_elastic.py --index test_index --query "test question"
docker build -t icarkno .
docker run -p 5000:5000 --env-file .env icarkno
# Using Gunicorn
gunicorn --bind 0.0.0.0:5000 --workers 4 run:app
# With SSL
gunicorn --bind 0.0.0.0:443 --certfile cert.pem --keyfile key.pem run:app
- Processing Speed: ~2-5 seconds per page
- Concurrent Users: Supports 100+ simultaneous users
- Storage: Elasticsearch scales horizontally
- Languages: 23+ supported with translation API
- Rate Limits: Configurable per user tier
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit changes (
git commit -m 'Add amazing feature'
) - Push to branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is proprietary software. See LICENSE for details.
- Documentation: API Docs
- Issues: GitHub Issues
- Email: [email protected]
- OpenAI for GPT and embedding models
- Elasticsearch for search and analytics
- LangChain for LLM orchestration
- Flask for the web framework
Made with β€οΈ by Carnot Research