Skip to content

Releases: scientist-labs/red-candle

Red Candle v1.3.0 Release Notes

13 Sep 16:44
f5d8072

Choose a tag to compare

🎉 Ruby 3.1+ Support

Breaking Change: This release expands Ruby compatibility by lowering the minimum required version from Ruby 3.2.0 to Ruby 3.1.0, making Red Candle accessible to more developers and systems.

✨ Major Features

Custom Tokenizer Support for All Models

All safetensors model types (Llama, Mistral, Gemma, Qwen, and Phi) now support custom tokenizers:

# Load a model with a specific tokenizer
llm = Candle::LLM.from_pretrained(
  "model-without-bundled-tokenizer",
  tokenizer: "appropriate-tokenizer-repo"
)

# Useful for models that don't include tokenizers or when using specialized tokenizers
llm = Candle::LLM.from_pretrained(
  "meta-llama/Llama-2-7b",
  tokenizer: "meta-llama/Llama-2-7b-chat-hf"  # Use chat-tuned tokenizer
)

Enhanced Phi Model Support

  • Phi-3-mini-128k models now work correctly
  • Robust configuration preprocessing handles non-standard fields
  • Automatic handling of gegelu activation and ff_intermediate_size parameters
  • Improved compatibility with various Phi-3 model variants

Reranker Performance Improvements

  • Configurable max length for better performance tuning
  • Automatic truncation of long inputs to prevent errors
  • Significant performance optimizations for document ranking tasks
# Configure max length for optimal performance
reranker = Candle::Reranker.from_pretrained(
  "BAAI/bge-reranker-base",
  max_length: 512  # Customize based on your use case
)

🛡️ Security & Stability

Dependency Updates

  • Security Fix: Resolved PyO3 buffer overflow vulnerability (CVE in PyO3 < 0.24.1)
  • Major Updates:
    • tokenizers: 0.21.1 → 0.22.0
    • outlines-core: 0.2 → 0.2.11 (eliminates PyO3 dependency)
    • Most Rust dependencies updated to latest stable versions
  • Note: hf-hub downgraded from 0.4.3 to 0.4.1 for compatibility

Improved Error Messages

Error messages now provide actionable guidance:

  • Missing tokenizers suggest explicit tokenizer parameter
  • Authentication errors mention HF_TOKEN requirement
  • Network issues include connectivity troubleshooting tips

🔧 Developer Experience

Logging System

New Ruby logger integration for cleaner output:

  • Controlled via CANDLE_VERBOSE environment variable
  • Reduced noise from Rust internals
  • Better debugging experience

Repository Migration

  • Moved from assaydepot to scientist-labs organization
  • Updated all documentation and CI/CD pipelines
  • Maintained backward compatibility

📊 Testing

  • Comprehensive test suite for custom tokenizer functionality
  • All tests passing with improved coverage
  • Added specs for truncation and max length configuration
  • Security vulnerability regression tests

🔄 Migration Guide

From v1.2.x to v1.3.0

  1. Ruby Version: Ensure you have Ruby 3.1.0 or higher (previously required 3.2.0)

  2. Custom Tokenizers: If you have models failing due to missing tokenizers, you can now specify one:

    # Before: Would fail
    llm = Candle::LLM.from_pretrained("model-without-tokenizer")
    
    # After: Works with explicit tokenizer
    llm = Candle::LLM.from_pretrained(
      "model-without-tokenizer",
      tokenizer: "appropriate-tokenizer"
    )
  3. Reranker Configuration: Take advantage of performance tuning:

    # Optimize for shorter documents
    reranker = Candle::Reranker.from_pretrained(
      "BAAI/bge-reranker-base",
      max_length: 256
    )
  4. Logging: Use the new logging system:

    # Enable verbose logging
    CANDLE_VERBOSE=1 ruby your_script.rb

🐛 Bug Fixes

  • Fixed regex validation security vulnerabilities
  • Resolved build issues with certain dependency combinations
  • Fixed Phi-3 model loading failures
  • Corrected truncation behavior in rerankers

📝 Documentation

  • Updated MODEL_SUPPORT.md with current compatibility matrix
  • Enhanced HUGGINGFACE.md with custom tokenizer examples
  • Added comprehensive examples for new features

📦 Installation

gem install red-candle -v 1.3.0

Or add to your Gemfile:

gem 'red-candle', '~> 1.3.0'

🔗 Links

Red Candle v1.2.0 Release

10 Aug 03:04
ae1a949

Choose a tag to compare

🎯 Major API Standardization

This release brings significant API improvements focused on consistency and developer experience.

Standardized from_pretrained Interface

All model classes now use a consistent from_pretrained method for loading models:

# Before (v1.1.0) - inconsistent APIs
embedding = Candle::EmbeddingModel.new(repo: "BAAI/bge-base-en-v1.5")
reranker = Candle::Reranker.new(repo: "BAAI/bge-reranker-base")
llm = Candle::LLM.new(model_path: "path/to/model")

# After (v1.2.0) - standardized
embedding = Candle::EmbeddingModel.from_pretrained("BAAI/bge-base-en-v1.5")
reranker = Candle::Reranker.from_pretrained("BAAI/bge-reranker-base")
llm = Candle::LLM.from_pretrained("TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF", 
                                  gguf_file: "tinyllama-1.1b-chat-v1.0.Q3_K_S.gguf")

Consistent Parameter Naming

  • Renamed repomodel_id across all model types
  • Standardized device parameter handling
  • Unified tokenizer specification approach

Symbol-keyed Hashes

Model metadata and configurations now return hashes with symbolic keys for better Ruby idioms:

# Configuration hashes use symbols
config = { temperature: 0.7, max_length: 100, seed: 42 }
llm.generate("Hello", config: config)

🧪 Testing Infrastructure Overhaul

Migration to RSpec

  • Complete migration from Minitest to RSpec
  • 274 comprehensive specs with 82.73% code coverage
  • Shared examples for common model architectures
  • Improved test organization and maintainability

CI/CD Improvements

  • S3 Model Caching: Pre-downloaded models cached in AWS S3 for faster CI runs
  • HuggingFace Rate Limiting: Eliminated HTTP 429 errors with offline mode
  • Optimized Test Pipeline: Removed Valgrind from regular CI (20+ min → 2 min)
  • Separate workflows for memory profiling

🚀 Additional Improvements

  • Enhanced error messages with actionable solutions
  • Improved tokenizer auto-detection for GGUF models
  • Better device compatibility testing
  • Consolidated model caching logic
  • Documentation reorganization (moved to docs/ directory)
  • New Red Candle logo assets

📦 Dependency Updates

  • Added RSpec as development dependency
  • Updated rake tasks for new test framework

🔄 Breaking Changes

  • EmbeddingModel.new(repo:)EmbeddingModel.from_pretrained(model_id)
  • Reranker.new(repo:)Reranker.from_pretrained(model_id)
  • LLM.new(model_path:)LLM.from_pretrained(model_id)
  • Test suite now uses RSpec instead of Minitest

🔧 Migration Guide

Update your code to use the new standardized API:

# Update model loading
model = Candle::EmbeddingModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

# Use symbolic keys in configurations
config = { temperature: 0.8, max_length: 200 }

For testing, switch from Minitest to RSpec patterns and run tests with rake spec instead of rake test.

Red Candle 1.1.0 Release 🚀

27 Jul 03:07
a383e6c

Choose a tag to compare

We're excited to announce Red Candle 1.1.0, our first major update bringing powerful structured generation capabilities and expanded model support!

🎯 Structured Generation with Outlines

Red Candle now integrates outlines-core to enable structured generation. Control your model outputs with JSON schemas or regular expressions to ensure you always get valid, parseable results.

Quick Examples

JSON Schema Constraints

require 'candle'

llm = Candle::LLM.from_pretrained("TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
                                  gguf_file: "mistral-7b-instruct-v0.2.Q4_K_M.gguf")

# Define your schema
schema = {
  type: "object",
  properties: {
    answer: { type: "string", enum: ["yes", "no"] },
    confidence: { type: "number", minimum: 0, maximum: 1 }
  },
  required: ["answer"]
}

# Get structured output (automatically parsed!)
result = llm.generate_structured("Is Ruby a programming language?", schema: schema)
puts result
# => {"answer"=>"yes", "confidence"=>0.95}

Regular Expression Constraints

# Phone numbers
phone_constraint = llm.constraint_from_regex('\d{3}-\d{3}-\d{4}')
config = Candle::GenerationConfig.balanced(constraint: phone_constraint)
phone = llm.generate("Generate a phone number:", config: config)
# => "555-123-4567"

# Dates
date_constraint = llm.constraint_from_regex('\d{4}-\d{2}-\d{2}')
date = llm.generate("Today's date:", config: Candle::GenerationConfig.new(constraint: date_constraint))
# => "2024-12-19"

🆕 New Model Support

Phi Models

Red Candle now supports Microsoft's Phi family of models:

# Phi-2 (2.7B parameters)
llm = Candle::LLM.from_pretrained("microsoft/phi-2")
result = llm.generate("Write a factorial function in Python:")

# Phi-2 GGUF (quantized)
llm = Candle::LLM.from_pretrained("TheBloke/phi-2-GGUF",
                                  gguf_file: "phi-2.Q4_K_M.gguf")

# Phi-3 with chat interface
llm = Candle::LLM.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
messages = [
  { role: "system", content: "You are a helpful coding assistant." },
  { role: "user", content: "How do I reverse a string in Ruby?" }
]
response = llm.chat(messages)

Qwen Models

Support for Alibaba's Qwen family:

# Qwen2 GGUF models (recommended)
llm = Candle::LLM.from_pretrained("Qwen/Qwen2-7B-Instruct-GGUF",
                                  gguf_file: "qwen2-7b-instruct-q4_k_m.gguf")

# Qwen2.5 models
llm = Candle::LLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct-GGUF",
                                  gguf_file: "qwen2.5-7b-instruct-q4_k_m.gguf")

# Chat with Qwen
messages = [
  { role: "user", content: "Explain Ruby blocks in simple terms" }
]
response = llm.chat(messages)

Note: Qwen3 GGUF support requires candle-transformers > 0.9.1. We recommend using Qwen2 or Qwen2.5 models for now.

📊 Structured Generation + New Models

Combine structured generation with any supported model:

# Use Phi for structured code generation
phi = Candle::LLM.from_pretrained("microsoft/phi-2")
code_schema = {
  type: "object",
  properties: {
    language: { type: "string", enum: ["python", "ruby", "javascript"] },
    code: { type: "string" },
    complexity: { type: "string", enum: ["simple", "intermediate", "advanced"] }
  }
}
code = phi.generate_structured("Write a quicksort implementation", schema: code_schema)

# Use Qwen for structured analysis
qwen = Candle::LLM.from_pretrained("Qwen/Qwen2-7B-Instruct-GGUF",
                                   gguf_file: "qwen2-7b-instruct-q4_k_m.gguf")
sentiment_schema = {
  type: "object",
  properties: {
    sentiment: { type: "string", enum: ["positive", "negative", "neutral"] },
    score: { type: "number", minimum: -1, maximum: 1 }
  }
}
analysis = qwen.generate_structured("Ruby makes programming fun!", schema: sentiment_schema)

🚀 Performance Tips

  1. Model Selection: Larger models (7B+) have 90-95% success rate with complex schemas
  2. Temperature: Use lower temperatures (0.1-0.5) for better structured generation
  3. Max Length: Ensure sufficient tokens for complete JSON output
  4. Schema Design: Keep schemas as simple as possible while meeting requirements

📦 Installation

gem install red-candle

Or add to your Gemfile:

gem 'red-candle', '~> 1.1.0'

🔧 Breaking Changes

None! This release maintains full backward compatibility with 1.0.x.

🐛 Bug Fixes

  • Improved tokenizer auto-detection for GGUF models
  • Better error messages for model loading failures
  • Fixed memory leaks in streaming generation

📚 Documentation

🙏 Acknowledgments

Thanks to the outlines-core team for their excellent structured generation library, and to all contributors who helped with testing and feedback!


Ready to add structure to your LLM outputs? Update to 1.1.0 and start generating exactly what you need! 🎯✨

Red Candle 1.0.0 Release 🚀

27 Jul 02:57
8fd0179

Choose a tag to compare

We're thrilled to announce the 1.0.0 release of red-candle, bringing the power of Hugging Face's Candle ML library to the Ruby ecosystem!

What is Red Candle?

Red Candle is a Ruby gem that provides native access to state-of-the-art machine learning models through Rust bindings. It enables Ruby developers to leverage modern NLP capabilities with the performance and efficiency of Rust.

Features

🤖 Large Language Models (LLMs)

  • Support for both quantized (GGUF) and unquantized (SafeTensors) formats
  • Built-in support for popular models:
    • Mistral - Efficient and powerful instruction-following models
    • Gemma - Google's lightweight, open models
    • Llama - Meta's family of foundation models
  • Streaming generation with customizable parameters
  • Chat interface with automatic template formatting

📊 NLP Tools

  • Embedding Models - Generate high-quality text embeddings for semantic search and similarity
  • Rerankers - Improve search relevance with cross-encoder models
  • Named Entity Recognition (NER) - Extract entities from text with pre-trained models
  • Tokenizers - Direct access to model tokenization with full vocabulary control

🚀 Performance

  • Hardware acceleration support:
    • Metal (Apple Silicon)
    • CUDA (NVIDIA GPUs)
    • Optimized CPU inference
  • Automatic device detection and selection
  • Memory-efficient model loading

Getting Started

require 'candle'

# Load an LLM
llm = Candle::LLM.from_pretrained("TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
                                  gguf_file: "mistral-7b-instruct-v0.2.Q4_K_M.gguf")

# Generate text
response = llm.generate("What is Ruby?")

# Or use the chat interface
messages = [
  { role: "user", content: "Explain Ruby in one sentence." }
]
response = llm.chat(messages)

Installation

gem install red-candle

Requirements

  • Ruby 3.0+
  • Rust toolchain (for compilation)
  • Optional: CUDA toolkit or Metal framework for GPU acceleration

What's Next?

This is just the beginning! We're committed to expanding Red Candle's capabilities and would love your feedback and contributions. Check out our GitHub repository for examples, documentation, and to report issues.

Acknowledgments

Special thanks to the Hugging Face team for the incredible Candle library, and to all our early testers and contributors who helped shape this release.


Happy coding with Red Candle! 🕯️✨