Skip to content

feat: LLM Router extension for cost-optimized model selection#476

Open
bsbodden wants to merge 11 commits intomainfrom
llm-router
Open

feat: LLM Router extension for cost-optimized model selection#476
bsbodden wants to merge 11 commits intomainfrom
llm-router

Conversation

@bsbodden
Copy link
Collaborator

Adds LLMRouter and AsyncLLMRouter — a new RedisVL extension that routes queries to the cheapest LLM capable of handling them using Redis vector search. This is
the natural complement to SemanticCache/LangCache: caching eliminates redundant calls, routing optimizes the calls you must make.

  • "hello, how are you?" → GPT-4.1 Nano ($0.10/M tokens)
  • "explain garbage collection" → Claude Sonnet 4.5 ($3/M tokens)
  • "architect a distributed system" → Claude Opus 4.5 ($5/M tokens)

Why this matters

Enterprise LLM spend reached $8.4B (Menlo Ventures, mid-2025) and 53% of AI teams exceed cost forecasts by 40%+. The root cause: every query hits the most
expensive model. Academic research (RouteLLM/ICLR 2025, FrugalGPT/Stanford) shows 30-85% cost savings from intelligent routing. A funded startup ecosystem
validates the category — OpenRouter ($500M valuation, $40M raised), Martian (Accenture-backed), NotDiamond (IBM/SAP-backed), Unify (YC/Microsoft-backed).

RedisVL's LLM Router is the first open-source, Redis-native, self-hosted, multi-tier routing solution. Combined with LangCache/SemanticCache, it forms a
complete cost optimization stack no competitor offers.

Key features

  • Pretrained config: Ships with a 3-tier Bloom's Taxonomy config (simple/standard/expert) with 18 reference phrases per tier and pre-computed embeddings — zero
    setup required
  • Cost-aware routing: Optional cost penalty biases toward cheaper tiers when distances are close
  • LiteLLM-compatible: Model strings (provider/model) work directly with LiteLLM's 100+ providers
  • Per-tier thresholds: Each tier has independent distance thresholds for fine-grained control
  • Full async support: AsyncLLMRouter with create() factory pattern
  • Portable configs: Export/import routers with pre-computed embeddings via export_with_embeddings() / from_pretrained()

Adds intelligent LLM model routing using semantic similarity:

- ModelTier: Define model tiers with references and thresholds
- LLMRouter: Route queries to optimal model tier
- LLMRouteMatch: Routing result with tier, model, confidence
- Cost optimization: Prefer cheaper tiers when distances close
- Pretrained support: Export/import with pre-computed embeddings

Integration tests define expected behavior (test-first approach).

Part of redis-vl-python enhancement for intelligent LLM auto-selection.
Tests for:
- ModelTier validation (name, model, references, threshold bounds)
- LLMRouteMatch (truthy/falsy, alternatives, metadata)
- RoutingConfig (defaults, custom values, bounds)
- Pretrained schemas (reference, tier, config)
- DistanceAggregationMethod enum
- Fix from_pretrained() to use model_construct() instead of object.__new__()
- Update test_cost_optimization_prefers_cheaper to use matching query
- Update test_add_tier_references to verify references added correctly
- Add tests/unit/conftest.py to skip Docker fixtures for unit tests
- Add tests/integration/conftest.py to use local Redis when available
- test_add_tier_references now verifies reference addition without strict routing
- Cost optimization test uses query that better matches references
- All 22 integration tests should now pass
- Problem statement and existing solution limitations
- Architecture diagrams and key design decisions
- API examples and comparison with SemanticRouter
- Testing guide and future enhancements
…eddings

Add a built-in 3-tier pretrained configuration (simple/standard/expert)
grounded in Bloom's Taxonomy with 18 reference phrases per tier and
pre-computed embeddings from sentence-transformers/all-mpnet-base-v2.

Includes generation script and pretrained loader for named configs.
Add AsyncLLMRouter with async factory pattern (create() classmethod),
mirroring all sync LLMRouter functionality with async I/O. Update
module exports and correct simple tier model to openai/gpt-4.1-nano
for accurate cost optimization.
Add comprehensive async integration tests mirroring all sync tests
with AsyncLLMRouter.create() factory. Add pretrained config tests
for default 3-tier routing. Update model references and pricing
assertions to match corrected tier definitions.
Add comprehensive Jupyter notebook (13_llm_router.ipynb) covering
pretrained routing, custom tiers, cost optimization, tier management,
serialization, and async usage. Update DESIGN.md with async support,
pretrained config details, and corrected model pricing.
Copilot AI review requested due to automatic review settings February 16, 2026 22:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an LLM Router extension for RedisVL that enables cost-optimized model selection through semantic routing. The router uses Redis vector search to match queries to model tiers based on semantic similarity to reference phrases, allowing applications to route simple queries to cheaper models and complex queries to more capable (expensive) models.

Changes:

  • New LLMRouter and AsyncLLMRouter classes for intelligent model tier selection
  • Pretrained configuration system with built-in "default" config featuring 3 tiers (simple/standard/expert)
  • Comprehensive test suite including unit tests and integration tests for both sync and async implementations

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
redisvl/extensions/llm_router/router.py Core implementation of sync and async LLM routers with routing logic and tier management
redisvl/extensions/llm_router/schema.py Pydantic models for ModelTier, LLMRouteMatch, RoutingConfig, and pretrained configurations
redisvl/extensions/llm_router/__init__.py Public API exports for the extension
redisvl/extensions/llm_router/pretrained/__init__.py Loader for pretrained router configurations
scripts/generate_pretrained_config.py Script to generate pretrained configs with embedded reference vectors
tests/unit/test_llm_router_schema.py Unit tests for schema validation and Pydantic models
tests/unit/conftest.py Test configuration to allow unit tests without Docker/Redis
tests/integration/test_llm_router.py Integration tests for sync LLMRouter functionality
tests/integration/test_async_llm_router.py Integration tests for async AsyncLLMRouter functionality
tests/integration/conftest.py Configuration for integration tests with optional Docker override
redisvl/extensions/llm_router/DESIGN.md Comprehensive design documentation
docs/user_guide/13_llm_router.ipynb User guide notebook with examples and usage patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…assmethods

The from_pretrained and from_existing methods (sync and async) ignored a
provided redis_client because redis_url defaults to "redis://localhost:6379"
and was always truthy. This caused ConnectionRefusedError in CI where Redis
runs on a dynamic testcontainer port.
- Validate threshold range (0, 2] in update_tier_threshold before
  assignment, matching the ModelTier Pydantic schema constraint.
- Guard _get_tier_matches against empty tiers list to prevent
  ValueError from max() on empty sequence.

Applied to both sync and async implementations.
Copilot AI review requested due to automatic review settings February 17, 2026 00:45
@bsbodden bsbodden self-assigned this Feb 17, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

overwrite: bool = False,
cost_optimization: bool = False,
connection_kwargs: Dict[str, Any] = {},
**kwargs,
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**kwargs is accepted but never used in __init__, so any caller-provided extra keyword args are silently ignored (and won’t reach Redis connection setup). Either remove **kwargs to avoid surprising behavior or merge/forward it into connection_kwargs / _initialize_index.

Suggested change
**kwargs,

Copilot uses AI. Check for mistakes.
metadata: Dict[str, Any] = Field(default_factory=dict)
"""Tier metadata."""

distance_threshold: float = 0.5
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distance_threshold in PretrainedTier has no bounds validation, unlike ModelTier (gt=0, le=2). Invalid values could be loaded from JSON and break routing/filtering. Use the same constrained/strict field definition here for consistency and safety.

Suggested change
distance_threshold: float = 0.5
distance_threshold: Annotated[float, Field(strict=True, gt=0, le=2)] = 0.5

Copilot uses AI. Check for mistakes.
Comment on lines +1433 to +1438
index = AsyncSearchIndex(
schema=schema,
redis_client=redis_client,
)
await index.create(overwrite=True, drop=False)

Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async from_pretrained() also recreates the index with overwrite=True unconditionally, which can clobber an existing router/index with the same name. Add an explicit overwrite parameter and default to non-destructive behavior.

Copilot uses AI. Check for mistakes.
import yaml
from pydantic import BaseModel, ConfigDict, Field, PrivateAttr
from redis.commands.search.aggregation import AggregateRequest, AggregateResult, Reducer
from redis.exceptions import ResponseError
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports here (e.g., AggregateResult, Reducer) will trigger pylint failures and should be removed or used. Consider importing only AggregateRequest from redis.commands.search.aggregation.

Suggested change
from redis.exceptions import ResponseError

Copilot uses AI. Check for mistakes.
PretrainedTier,
RoutingConfig,
)
from redisvl.extensions.router.schema import Route, SemanticRouterIndexSchema
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Route is imported but never used; this will fail pylint’s unused-import check. Remove the Route import (keep SemanticRouterIndexSchema).

Suggested change
from redisvl.extensions.router.schema import Route, SemanticRouterIndexSchema
from redisvl.extensions.router.schema import SemanticRouterIndexSchema

Copilot uses AI. Check for mistakes.
schema=schema,
redis_client=redis_client,
)
index.create(overwrite=True, drop=False)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_pretrained() unconditionally recreates the index with overwrite=True. This is destructive if an index with the same name already exists in the target Redis. Consider adding an overwrite: bool = False parameter (mirroring __init__) and erroring when the index exists unless overwrite is explicitly requested.

Suggested change
index.create(overwrite=True, drop=False)
index.create(overwrite=False, drop=False)

Copilot uses AI. Check for mistakes.
Comment on lines +839 to +842
overwrite: bool = False,
cost_optimization: bool = False,
connection_kwargs: Dict[str, Any] = {},
**kwargs,
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async factory has the same mutable-default issue: connection_kwargs defaults to {}. Switch to None and instantiate a new dict inside create() to avoid shared state and pylint warnings.

Copilot uses AI. Check for mistakes.
Comment on lines +840 to +843
cost_optimization: bool = False,
connection_kwargs: Dict[str, Any] = {},
**kwargs,
) -> "AsyncLLMRouter":
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Async create() accepts **kwargs but never uses it, so extra keyword args are silently dropped. Either remove **kwargs or forward/merge them into the Redis connection kwargs (consistent with from_existing).

Copilot uses AI. Check for mistakes.
Comment on lines +100 to +103
overwrite: bool = False,
cost_optimization: bool = False,
connection_kwargs: Dict[str, Any] = {},
**kwargs,
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection_kwargs uses a mutable default ({}), which can leak state across instances and is flagged by pylint. Use None as the default and create a new dict inside the method (or use Field(default_factory=dict) style patterns).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant