Skip to content

Releases: scientist-labs/lancelot

Lancelot 0.3.0 Release

10 Aug 03:12
831afef

Choose a tag to compare

New Features

  • Reciprocal Rank Fusion (RRF): Added support for hybrid search combining vector and text search results with configurable alpha weighting (#4)
  • Optional Field Support: Records can now be added with missing fields - unspecified fields will be set to null instead of raising errors

Improvements

  • Enhanced README with comprehensive quick start guide and usage examples
  • Added example demonstrating optional field usage (examples/optional_fields_demo.rb)
  • Expanded test coverage for new features

Bug Fixes

  • Fixed field requirement validation to properly handle optional fields in dataset schemas

This release focuses on making Lancelot more flexible for real-world use cases where not all data fields are always present, and enables sophisticated hybrid search capabilities through RRF.

🎯 Lancelot 0.2.0 - Hybrid Search with Reciprocal Rank Fusion

27 Jul 18:45
0a44173

Choose a tag to compare

We're thrilled to announce Lancelot 0.2.0, bringing powerful hybrid search capabilities to Ruby! This release introduces Reciprocal Rank Fusion (RRF), enabling you to combine vector and text
search results for superior search quality.

🆕 What's New

Hybrid Search with RRF

The star feature of this release is the new hybrid_search method, which intelligently combines results from different search modalities:

  # Combine semantic vector search with keyword text search
  results = dataset.hybrid_search(
    "machine learning",                    # Text query
    vector: text_to_embedding("ML/AI"),    # Vector query
    vector_column: "embedding",
    text_column: "content",
    limit: 10
  )

  # Results include RRF scores for ranking
  results.each do |doc|
    puts "#{doc[:title]} - Score: #{doc[:rrf_score]}"
  end

Flexible Search Combinations

  • Same Query, Multiple Modalities: Use the same query for both vector and text search to capture both semantic and lexical matches
  • Different Queries per Modality: Use conceptual queries for vectors and specific keywords for text
  • Multi-Column Text Search: Search across multiple text columns while doing vector similarity
  • Custom Fusion Parameters: Tune the RRF k parameter to control result blending

RankFusion Module

For advanced use cases, directly access the RRF algorithm:

  # Combine results from multiple independent searches
  vector_results = dataset.vector_search(embedding1, limit: 20)
  text_results = dataset.text_search("neural networks", limit: 20)
  keyword_results = dataset.text_search("PyTorch", limit: 20)

  fused = Lancelot::RankFusion.reciprocal_rank_fusion(
    [vector_results, text_results, keyword_results],
    k: 60
  )

🔬 Why RRF?

Reciprocal Rank Fusion is a robust, parameter-free fusion method that:

  • Combines rankings from different search types without score normalization
  • Handles missing documents gracefully (documents that appear in only some result sets)
  • Provides consistent, high-quality results across diverse query types
  • Uses the formula: score = Σ(1/(k+rank)) where k=60 by default

📈 Performance & Quality

  • Better Recall: Captures relevant documents that might rank poorly in one modality but well in another
  • Improved Precision: Documents appearing high in multiple result lists get boosted
  • Flexible Querying: Supports everything from single-modality searches to complex multi-query fusion

💎 Ruby-First Design

As always, we've kept the API idiomatic and intuitive:

  • Named parameters for clarity
  • Sensible defaults (k=60, limit=10)
  • Graceful handling of edge cases
  • Comprehensive error messages

🚀 Upgrade Guide

gem update lancelot

The new features are additive - all existing code continues to work. To use hybrid search, ensure you have both vector and text indices created:

  dataset.create_vector_index("embedding")
  dataset.create_text_index("content")

🙏 Acknowledgments

Thanks to our contributors and the Ruby ML community for feedback and suggestions.


Documentation: Updated examples and API docs at https://github.com/cpetersen/lancelot

🚀 Lancelot 0.1.0 - Initial Release

27 Jul 18:31
cc97455

Choose a tag to compare

We're excited to announce the first release of Lancelot - Ruby bindings for https://github.com/lancedb/lance, a modern columnar data format for ML!

✨ Features

Core Functionality

  • Dataset Management: Create and open Lance datasets with Ruby-native APIs
  • Schema Support: Define schemas with multiple data types including vectors
  • Document Operations: Add, retrieve, and iterate through documents with full Enumerable support
  • Vector Search: Build ANN indices and perform fast similarity search
  • Full-Text Search: Create inverted indices for BM25-powered text search across single or multiple columns
  • SQL Filtering: Query datasets using SQL-like WHERE clauses
  • Ruby-First Design: Idiomatic Ruby API with operator overloading (<<), enumerable methods, and familiar patterns

Supported Data Types

  • Strings (:string)
  • Floating point (:float32, :float64)
  • Integers (:int32, :int64)
  • Booleans (:boolean)
  • Fixed-size vectors with configurable dimensions

🔧 Technical Details

  • Built with Magnus for Ruby-Rust interop
  • Embedded Tokio runtime for async Lance operations
  • Clean separation between Ruby API and Rust implementation
  • Comprehensive test coverage with RSpec

📚 Example Usage

  require 'lancelot'

  # Create a dataset
  dataset = Lancelot::Dataset.create("./my_data", schema: {
    title: :string,
    content: :string,
    embedding: { type: "vector", dimension: 384 }
  })

  # Add documents
  dataset << {
    title: "Introduction to Ruby",
    content: "Ruby is a dynamic programming language...",
    embedding: [0.1, 0.2, ...]
  }

  # Create indices
  dataset.create_vector_index("embedding")
  dataset.create_text_index("content")

  # Search
  vector_results = dataset.vector_search(query_embedding, column: "embedding", limit: 10)
  text_results = dataset.text_search("ruby programming", column: "content", limit: 10)

🎯 Who is this for?

Lancelot is perfect for Ruby developers who need:

  • Efficient storage and retrieval of embeddings
  • Combined vector and text search capabilities
  • A columnar format optimized for ML workloads
  • Integration with the Ruby ML ecosystem (works great with red-candle!)

🚧 Coming Soon

  • Hybrid search with Reciprocal Rank Fusion (RRF)

🙏 Acknowledgments

Special thanks to the Lance team for creating such a powerful columnar format, and to the Magnus project for making Ruby-Rust interop a pleasure to work with.


Installation: gem install lancelot or add to your Gemfile: gem 'lancelot'

Documentation: https://github.com/cpetersen/lancelot

License: MIT