Change the repository type filter
All
Repositories list
16 repositories
- Ruby document parsing toolkit with zero runtime dependencies. Parse PDFs, DOCX, XLSX, and images (with OCR) using a single, lightweight gem. Statically links MuPDF and Tesseract at compile time for hassle-free installation - no system libraries or external tools required.
fastsheet
Publicscientist-labs.github.io
Publictokenkit
PublicFast, Rust-backed word-level tokenization for Ruby. Unlike subword tokenizers (BPE, WordPiece) designed for LLMs, TokenKit provides linguistic tokenization for search engines, text mining, and NLP pipelines—preserving domain-specific patterns like gene names, measurements, and technical terms while handling Unicode correctly.phrasekit
Publicindradb
Publicragnar-cli
PublicRagnar is a pure Ruby command-line RAG (Retrieval-Augmented Generation) tool with zero external dependencies. It provides local document indexing, semantic search, and LLM-powered query processing. Built to be hackable, it lets Ruby developers experiment with agentic workflows and RAG pipelines natively in Ruby.red-candle
Publiclancelot
PublicRuby bindings for the Lance columnar data format. Built on the Lance Rust crate, Lancelot brings high-performance vector search, full-text search, and hybrid retrieval to Ruby applications with a native, idiomatic API.- High-performance UMAP dimensionality reduction for Ruby, powered by the annembed Rust crate. Fast, memory-efficient manifold learning with model persistence.
topical
Publicannembed
Publichnswlib-rs
Publicrrf
Public