Skip to content
Change the repository type filter

All

    Repositories list

    • parsekit

      Public
      Ruby document parsing toolkit with zero runtime dependencies. Parse PDFs, DOCX, XLSX, and images (with OCR) using a single, lightweight gem. Statically links MuPDF and Tesseract at compile time for hassle-free installation - no system libraries or external tools required.
      Ruby
      14001Updated Oct 13, 2025Oct 13, 2025
    • fastsheet

      Public
      FastSheet is the fastest XLSX file parser for Ruby (at the time of release). It leverages a Rust library for high-performance parsing, making it significantly faster than other available solutions.
      Ruby
      7000Updated Oct 9, 2025Oct 9, 2025
    • Scientist-labs portfolio page.
      TypeScript
      2k000Updated Sep 30, 2025Sep 30, 2025
    • tokenkit

      Public
      Fast, Rust-backed word-level tokenization for Ruby. Unlike subword tokenizers (BPE, WordPiece) designed for LLMs, TokenKit provides linguistic tokenization for search engines, text mining, and NLP pipelines—preserving domain-specific patterns like gene names, measurements, and technical terms while handling Unicode correctly.
      Ruby
      0300Updated Sep 29, 2025Sep 29, 2025
    • spellkit

      Public
      Fast, safe typo correction for Ruby. SymSpell-based spell checker with Rust performance, term protection via regex patterns, and hot-reloadable dictionaries. Sub-millisecond latency, zero dependencies.
      Ruby
      0600Updated Sep 28, 2025Sep 28, 2025
    • phrasekit

      Public
      Weak supervision for NER: mine domain-specific phrases from unlabeled corpora, score by salience, and auto-generate training labels. Ruby gem with high-performance Rust engines.
      Rust
      0001Updated Sep 28, 2025Sep 28, 2025
    • indradb

      Public
      A graph database written in rust
      Rust
      130101Updated Sep 25, 2025Sep 25, 2025
    • Ragnar is a pure Ruby command-line RAG (Retrieval-Augmented Generation) tool with zero external dependencies. It provides local document indexing, semantic search, and LLM-powered query processing. Built to be hackable, it lets Ruby developers experiment with agentic workflows and RAG pipelines natively in Ruby.
      Ruby
      1701Updated Sep 15, 2025Sep 15, 2025
    • Ruby gem for running state-of-the-art language models locally. Access LLMs, embeddings, rerankers, and NER models directly from Ruby using Rust-powered Candle with Metal/CUDA acceleration.
      Rust
      618431Updated Sep 13, 2025Sep 13, 2025
    • thor-interactive

      Public
      Turn any Thor CLI into an interactive REPL with persistent state, auto-completion, and configurable default handlers for unrecognized input.
      Ruby
      0202Updated Sep 8, 2025Sep 8, 2025
    • lancelot

      Public
      Ruby bindings for the Lance columnar data format. Built on the Lance Rust crate, Lancelot brings high-performance vector search, full-text search, and hybrid retrieval to Ruby applications with a native, idiomatic API.
      Ruby
      0700Updated Sep 8, 2025Sep 8, 2025
    • clusterkit

      Public
      High-performance UMAP dimensionality reduction for Ruby, powered by the annembed Rust crate. Fast, memory-efficient manifold learning with model persistence.
      Ruby
      1500Updated Sep 6, 2025Sep 6, 2025
    • topical

      Public
      Ruby library for fast, flexible topic modeling — built on modern embeddings and clustering techniques to uncover themes in text.
      Ruby
      01010Updated Sep 6, 2025Sep 6, 2025
    • annembed

      Public
      data embedding based on approximate nearest neighbour
      Rust
      8003Updated Sep 6, 2025Sep 6, 2025
    • Rust implementation of the HNSW algorithm (Malkov-Yashunin)
      Rust
      38001Updated Aug 19, 2025Aug 19, 2025
    • rrf

      Public
      Reciprocal Rank Fusion for Ruby
      Ruby
      0511Updated Jul 18, 2025Jul 18, 2025