Skip to content

Gnosis Wraith is a powerful web crawling and content analysis system that serves as the perception layer for AI systems

Notifications You must be signed in to change notification settings

kordless/gnosis-wraith

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Gnosis Wraith

Vision: The Adaptive Web Perception Engine

Gnosis Wraith is a powerful web crawling and content analysis system that serves as the perception layer for AI systems. It captures, analyzes, and transforms content through multiple intelligence layers - from basic crawling and OCR to sophisticated AI analysis and integration.

Gnosis is an AI oracle, and Wraith is the eye.

Try It Now!

Gnosis Wraith

Join the Community

Discord

Key Features

  • 🌐 Intelligent Web Crawling - Process any website with or without JavaScript
  • πŸ“Έ Screenshot Capture - High-quality visual representation of web pages
  • πŸ–ΌοΈ OCR Processing - Extract text from images using EasyOCR
  • 🧠 AI Content Analysis - Process content with OpenAI, Claude, Google Gemini or local models
  • πŸ”„ DOM Content Extraction - Direct browser DOM capturing via extension
  • πŸ“Š Smart Content Filtering - Automated filtering of relevant information
  • πŸ“ Report Generation - Beautiful Markdown and HTML reports
  • 🧩 Browser Extension - Capture and process content directly from your browser
  • ⚑ Lightning Network - Optional micropayments for AI analysis
  • πŸ—‚οΈ User-Isolated Storage - Multi-tenant support with hash-based user bucketing
  • ☁️ Cloud-Ready Storage - Seamless switching between local filesystem and Google Cloud Storage

πŸ†• New in v2 API

  • πŸ’» JavaScript Execution - Execute custom JavaScript on any webpage with safety validation
  • πŸ€– LLM-Powered JavaScript - Generate JavaScript from natural language requests
  • πŸ“‹ Content Analysis - Extract entities, sentiment, and structured data using LLMs
  • 🧹 Smart Markdown Cleanup - AI-powered content cleaning and optimization
  • πŸ“„ Intelligent Summarization - Create summaries in multiple formats and styles

Quick Installation

Docker (Recommended)

# Pull the latest image
docker pull kordless/gnosis-wraith:latest

# Run the container
docker run -d -p 5678:5678 --name gnosis-wraith kordless/gnosis-wraith:latest

# Access the web interface at http://localhost:5678

Docker Compose

Create a docker-compose.yml file:

version: '3'
services:
  gnosis-wraith:
    image: kordless/gnosis-wraith:latest
    ports:
      - "5678:5678"
    volumes:
      - ./data:/data
    restart: unless-stopped

Then run:

docker-compose up -d

Cloud Deployment

Gnosis Wraith is deployable to Google Cloud Run out of the box, providing a serverless container execution environment:

# Deploy to Google Cloud Run
gcloud run deploy gnosis-wraith --image kordless/gnosis-wraith:latest --platform managed

Distributed Setup

For high-volume crawling requirements:

  • Deploy multiple instances as on-demand crawlers in a distributed setup
  • Scale horizontally to increase performance for large crawling operations
  • Task system efficiently handles processing delays, making it suitable for asynchronous operation

Performance Considerations

  • While GPU acceleration is supported for OCR and AI processing tasks, the system's asynchronous task architecture efficiently handles processing delays even on CPU-only deployments
  • The job system effectively manages resource-intensive operations by queuing and processing them as resources become available

Browser Extension

For full functionality, install the browser extension:

  1. Access the web interface at http://localhost:5678
  2. Navigate to the "Browser Extension" tab
  3. Download and install the extension following browser-specific instructions
  4. Use keyboard shortcuts or context menus to capture and analyze web content

Usage Guide

Web Interface

The web interface offers multiple ways to interact with Gnosis Wraith:

  • Single URL Crawl - Process any website with customizable options
  • Image Upload - Extract text from images using OCR
  • Browser Extension - Capture and process web content directly

API Endpoints

  • /api/crawl - Crawl URLs and generate comprehensive reports
  • /api/upload - Upload and analyze images
  • /api/jobs - Manage background processing jobs
  • /reports - Access generated reports

Storage System

Gnosis Wraith features an advanced storage abstraction layer that supports both local development and cloud deployment:

Key Features

  • User Isolation: Each user's reports are stored in separate buckets using SHA-256 hashing
  • Cloud Support: Automatic switching between local filesystem and Google Cloud Storage (GCS)
  • Multi-Tenancy: Built-in support for multiple users with complete data isolation
  • Migration Tools: Scripts to migrate existing reports to the new structure

Storage Structure

users/
β”œβ”€β”€ a1b2c3d4e5f6/     # User hash bucket
β”‚   β”œβ”€β”€ reports/      # User's reports
β”‚   └── screenshots/  # User's screenshots
└── system/           # System/shared reports

For detailed implementation information, see:

Configuration Options

Gnosis Wraith offers flexible configuration:

  • JavaScript Rendering - Enable/disable JavaScript for web crawling
  • Screenshot Capture - Take screenshots of web pages
  • OCR Extraction - Extract text from images using OCR
  • Markdown Extraction - Control content extraction methods
  • AI Integration - Connect with various AI providers
  • Storage Backend - Choose between local filesystem or Google Cloud Storage
  • User Bucketing - Enable/disable user isolation for multi-tenant deployments

Architecture

Gnosis Wraith is built on a modular architecture with these core components:

  1. Web Server - Asynchronous Quart application for HTTP interface
  2. Browser Engine - Playwright-based automation for web interaction
  3. Processing Pipeline - Multi-stage content extraction and analysis
  4. Job System - Background processing for long-running tasks
  5. Storage Layer - Efficient report and image management
  6. Extension Integration - Browser-based content capture

Development Setup

For those wanting to contribute or modify:

  1. Clone the repository

    git clone https://github.com/kordless/gnosis-wraith.git
    cd gnosis-wraith
  2. Create and activate a virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Run the application

    python app.py

Future Roadmap

Gnosis Wraith is evolving toward a dynamic module system that will:

  • πŸ”„ Generate Client Modules - Create Python code to interact with any API or service on demand
  • πŸ”„ Interface Mirroring - Simulate any API it observes for other systems to use
  • πŸ”„ Data Source Integration - Connect to databases, queues, and network protocols automatically
  • πŸ”„ Self-Improvement - Learn from usage patterns to enhance generated code

This system will enable Gnosis Wraith to act as a universal adapter between diverse web services and data sources, dynamically extending its capabilities without manual coding.

Technologies

  • Python - Primary development language
  • Playwright - Modern browser automation
  • EasyOCR - Optical character recognition
  • Quart - Asynchronous web framework
  • AI Integration - OpenAI, Claude, Gemini, Ollama
  • Docker - Containerization for deployment

License

See LICENSE.md for details.


Seeing is believing. Gnosis Wraith sees it all.

About

Gnosis Wraith is a powerful web crawling and content analysis system that serves as the perception layer for AI systems

Resources

Stars

Watchers

Forks

Releases

No releases published