Gnosis Wraith is a powerful web crawling and content analysis system that serves as the perception layer for AI systems. It captures, analyzes, and transforms content through multiple intelligence layers - from basic crawling and OCR to sophisticated AI analysis and integration.
Gnosis is an AI oracle, and Wraith is the eye.
- π Intelligent Web Crawling - Process any website with or without JavaScript
- πΈ Screenshot Capture - High-quality visual representation of web pages
- πΌοΈ OCR Processing - Extract text from images using EasyOCR
- π§ AI Content Analysis - Process content with OpenAI, Claude, Google Gemini or local models
- π DOM Content Extraction - Direct browser DOM capturing via extension
- π Smart Content Filtering - Automated filtering of relevant information
- π Report Generation - Beautiful Markdown and HTML reports
- π§© Browser Extension - Capture and process content directly from your browser
- β‘ Lightning Network - Optional micropayments for AI analysis
- ποΈ User-Isolated Storage - Multi-tenant support with hash-based user bucketing
- βοΈ Cloud-Ready Storage - Seamless switching between local filesystem and Google Cloud Storage
- π» JavaScript Execution - Execute custom JavaScript on any webpage with safety validation
- π€ LLM-Powered JavaScript - Generate JavaScript from natural language requests
- π Content Analysis - Extract entities, sentiment, and structured data using LLMs
- π§Ή Smart Markdown Cleanup - AI-powered content cleaning and optimization
- π Intelligent Summarization - Create summaries in multiple formats and styles
# Pull the latest image
docker pull kordless/gnosis-wraith:latest
# Run the container
docker run -d -p 5678:5678 --name gnosis-wraith kordless/gnosis-wraith:latest
# Access the web interface at http://localhost:5678
Create a docker-compose.yml
file:
version: '3'
services:
gnosis-wraith:
image: kordless/gnosis-wraith:latest
ports:
- "5678:5678"
volumes:
- ./data:/data
restart: unless-stopped
Then run:
docker-compose up -d
Gnosis Wraith is deployable to Google Cloud Run out of the box, providing a serverless container execution environment:
# Deploy to Google Cloud Run
gcloud run deploy gnosis-wraith --image kordless/gnosis-wraith:latest --platform managed
For high-volume crawling requirements:
- Deploy multiple instances as on-demand crawlers in a distributed setup
- Scale horizontally to increase performance for large crawling operations
- Task system efficiently handles processing delays, making it suitable for asynchronous operation
- While GPU acceleration is supported for OCR and AI processing tasks, the system's asynchronous task architecture efficiently handles processing delays even on CPU-only deployments
- The job system effectively manages resource-intensive operations by queuing and processing them as resources become available
For full functionality, install the browser extension:
- Access the web interface at
http://localhost:5678
- Navigate to the "Browser Extension" tab
- Download and install the extension following browser-specific instructions
- Use keyboard shortcuts or context menus to capture and analyze web content
The web interface offers multiple ways to interact with Gnosis Wraith:
- Single URL Crawl - Process any website with customizable options
- Image Upload - Extract text from images using OCR
- Browser Extension - Capture and process web content directly
/api/crawl
- Crawl URLs and generate comprehensive reports/api/upload
- Upload and analyze images/api/jobs
- Manage background processing jobs/reports
- Access generated reports
Gnosis Wraith features an advanced storage abstraction layer that supports both local development and cloud deployment:
- User Isolation: Each user's reports are stored in separate buckets using SHA-256 hashing
- Cloud Support: Automatic switching between local filesystem and Google Cloud Storage (GCS)
- Multi-Tenancy: Built-in support for multiple users with complete data isolation
- Migration Tools: Scripts to migrate existing reports to the new structure
users/
βββ a1b2c3d4e5f6/ # User hash bucket
β βββ reports/ # User's reports
β βββ screenshots/ # User's screenshots
βββ system/ # System/shared reports
For detailed implementation information, see:
Gnosis Wraith offers flexible configuration:
- JavaScript Rendering - Enable/disable JavaScript for web crawling
- Screenshot Capture - Take screenshots of web pages
- OCR Extraction - Extract text from images using OCR
- Markdown Extraction - Control content extraction methods
- AI Integration - Connect with various AI providers
- Storage Backend - Choose between local filesystem or Google Cloud Storage
- User Bucketing - Enable/disable user isolation for multi-tenant deployments
Gnosis Wraith is built on a modular architecture with these core components:
- Web Server - Asynchronous Quart application for HTTP interface
- Browser Engine - Playwright-based automation for web interaction
- Processing Pipeline - Multi-stage content extraction and analysis
- Job System - Background processing for long-running tasks
- Storage Layer - Efficient report and image management
- Extension Integration - Browser-based content capture
For those wanting to contribute or modify:
-
Clone the repository
git clone https://github.com/kordless/gnosis-wraith.git cd gnosis-wraith
-
Create and activate a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Run the application
python app.py
Gnosis Wraith is evolving toward a dynamic module system that will:
- π Generate Client Modules - Create Python code to interact with any API or service on demand
- π Interface Mirroring - Simulate any API it observes for other systems to use
- π Data Source Integration - Connect to databases, queues, and network protocols automatically
- π Self-Improvement - Learn from usage patterns to enhance generated code
This system will enable Gnosis Wraith to act as a universal adapter between diverse web services and data sources, dynamically extending its capabilities without manual coding.
- Python - Primary development language
- Playwright - Modern browser automation
- EasyOCR - Optical character recognition
- Quart - Asynchronous web framework
- AI Integration - OpenAI, Claude, Gemini, Ollama
- Docker - Containerization for deployment
See LICENSE.md for details.
Seeing is believing. Gnosis Wraith sees it all.