feat: LLM Router extension for cost-optimized model selection by bsbodden · Pull Request #476 · redis/redis-vl-python

bsbodden · 2026-02-16T22:27:14Z

Adds LLMRouter and AsyncLLMRouter — a new RedisVL extension that routes queries to the cheapest LLM capable of handling them using Redis vector search. This is
the natural complement to SemanticCache/LangCache: caching eliminates redundant calls, routing optimizes the calls you must make.

"hello, how are you?" → GPT-4.1 Nano ($0.10/M tokens)
"explain garbage collection" → Claude Sonnet 4.5 ($3/M tokens)
"architect a distributed system" → Claude Opus 4.5 ($5/M tokens)

Why this matters

Enterprise LLM spend reached $8.4B (Menlo Ventures, mid-2025) and 53% of AI teams exceed cost forecasts by 40%+. The root cause: every query hits the most
expensive model. Academic research (RouteLLM/ICLR 2025, FrugalGPT/Stanford) shows 30-85% cost savings from intelligent routing. A funded startup ecosystem
validates the category — OpenRouter ($500M valuation, $40M raised), Martian (Accenture-backed), NotDiamond (IBM/SAP-backed), Unify (YC/Microsoft-backed).

RedisVL's LLM Router is the first open-source, Redis-native, self-hosted, multi-tier routing solution. Combined with LangCache/SemanticCache, it forms a
complete cost optimization stack no competitor offers.

Key features

Pretrained config: Ships with a 3-tier Bloom's Taxonomy config (simple/standard/expert) with 18 reference phrases per tier and pre-computed embeddings — zero
setup required
Cost-aware routing: Optional cost penalty biases toward cheaper tiers when distances are close
LiteLLM-compatible: Model strings (provider/model) work directly with LiteLLM's 100+ providers
Per-tier thresholds: Each tier has independent distance thresholds for fine-grained control
Full async support: AsyncLLMRouter with create() factory pattern
Portable configs: Export/import routers with pre-computed embeddings via export_with_embeddings() / from_pretrained()

Adds intelligent LLM model routing using semantic similarity: - ModelTier: Define model tiers with references and thresholds - LLMRouter: Route queries to optimal model tier - LLMRouteMatch: Routing result with tier, model, confidence - Cost optimization: Prefer cheaper tiers when distances close - Pretrained support: Export/import with pre-computed embeddings Integration tests define expected behavior (test-first approach). Part of redis-vl-python enhancement for intelligent LLM auto-selection.

Tests for: - ModelTier validation (name, model, references, threshold bounds) - LLMRouteMatch (truthy/falsy, alternatives, metadata) - RoutingConfig (defaults, custom values, bounds) - Pretrained schemas (reference, tier, config) - DistanceAggregationMethod enum

- Fix from_pretrained() to use model_construct() instead of object.__new__() - Update test_cost_optimization_prefers_cheaper to use matching query - Update test_add_tier_references to verify references added correctly - Add tests/unit/conftest.py to skip Docker fixtures for unit tests - Add tests/integration/conftest.py to use local Redis when available

- test_add_tier_references now verifies reference addition without strict routing - Cost optimization test uses query that better matches references - All 22 integration tests should now pass

- Problem statement and existing solution limitations - Architecture diagrams and key design decisions - API examples and comparison with SemanticRouter - Testing guide and future enhancements

…eddings Add a built-in 3-tier pretrained configuration (simple/standard/expert) grounded in Bloom's Taxonomy with 18 reference phrases per tier and pre-computed embeddings from sentence-transformers/all-mpnet-base-v2. Includes generation script and pretrained loader for named configs.

Add AsyncLLMRouter with async factory pattern (create() classmethod), mirroring all sync LLMRouter functionality with async I/O. Update module exports and correct simple tier model to openai/gpt-4.1-nano for accurate cost optimization.

Add comprehensive async integration tests mirroring all sync tests with AsyncLLMRouter.create() factory. Add pretrained config tests for default 3-tier routing. Update model references and pricing assertions to match corrected tier definitions.

Add comprehensive Jupyter notebook (13_llm_router.ipynb) covering pretrained routing, custom tiers, cost optimization, tier management, serialization, and async usage. Update DESIGN.md with async support, pretrained config details, and corrected model pricing.

Copilot

Pull request overview

This PR introduces an LLM Router extension for RedisVL that enables cost-optimized model selection through semantic routing. The router uses Redis vector search to match queries to model tiers based on semantic similarity to reference phrases, allowing applications to route simple queries to cheaper models and complex queries to more capable (expensive) models.

Changes:

New LLMRouter and AsyncLLMRouter classes for intelligent model tier selection
Pretrained configuration system with built-in "default" config featuring 3 tiers (simple/standard/expert)
Comprehensive test suite including unit tests and integration tests for both sync and async implementations

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`redisvl/extensions/llm_router/router.py`	Core implementation of sync and async LLM routers with routing logic and tier management
`redisvl/extensions/llm_router/schema.py`	Pydantic models for ModelTier, LLMRouteMatch, RoutingConfig, and pretrained configurations
`redisvl/extensions/llm_router/__init__.py`	Public API exports for the extension
`redisvl/extensions/llm_router/pretrained/__init__.py`	Loader for pretrained router configurations
`scripts/generate_pretrained_config.py`	Script to generate pretrained configs with embedded reference vectors
`tests/unit/test_llm_router_schema.py`	Unit tests for schema validation and Pydantic models
`tests/unit/conftest.py`	Test configuration to allow unit tests without Docker/Redis
`tests/integration/test_llm_router.py`	Integration tests for sync LLMRouter functionality
`tests/integration/test_async_llm_router.py`	Integration tests for async AsyncLLMRouter functionality
`tests/integration/conftest.py`	Configuration for integration tests with optional Docker override
`redisvl/extensions/llm_router/DESIGN.md`	Comprehensive design documentation
`docs/user_guide/13_llm_router.ipynb`	User guide notebook with examples and usage patterns

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

redisvl/extensions/llm_router/router.py

…assmethods The from_pretrained and from_existing methods (sync and async) ignored a provided redis_client because redis_url defaults to "redis://localhost:6379" and was always truthy. This caused ConnectionRefusedError in CI where Redis runs on a dynamic testcontainer port.

- Validate threshold range (0, 2] in update_tier_threshold before assignment, matching the ModelTier Pydantic schema constraint. - Guard _get_tier_matches against empty tiers list to prevent ValueError from max() on empty sequence. Applied to both sync and async implementations.

Copilot

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-17T00:52:31Z

redisvl/extensions/llm_router/router.py

+        overwrite: bool = False,
+        cost_optimization: bool = False,
+        connection_kwargs: Dict[str, Any] = {},
+        **kwargs,


**kwargs is accepted but never used in __init__, so any caller-provided extra keyword args are silently ignored (and won’t reach Redis connection setup). Either remove **kwargs to avoid surprising behavior or merge/forward it into connection_kwargs / _initialize_index.

Suggested change

**kwargs,

Copilot · 2026-02-17T00:52:32Z

redisvl/extensions/llm_router/schema.py

+    metadata: Dict[str, Any] = Field(default_factory=dict)
+    """Tier metadata."""
+
+    distance_threshold: float = 0.5


distance_threshold in PretrainedTier has no bounds validation, unlike ModelTier (gt=0, le=2). Invalid values could be loaded from JSON and break routing/filtering. Use the same constrained/strict field definition here for consistency and safety.

Suggested change

distance_threshold: float = 0.5

distance_threshold: Annotated[float, Field(strict=True, gt=0, le=2)] = 0.5

Copilot · 2026-02-17T00:52:32Z

redisvl/extensions/llm_router/router.py

+        index = AsyncSearchIndex(
+            schema=schema,
+            redis_client=redis_client,
+        )
+        await index.create(overwrite=True, drop=False)
+


Async from_pretrained() also recreates the index with overwrite=True unconditionally, which can clobber an existing router/index with the same name. Add an explicit overwrite parameter and default to non-destructive behavior.

Copilot · 2026-02-17T00:52:32Z

redisvl/extensions/llm_router/router.py

+import yaml
+from pydantic import BaseModel, ConfigDict, Field, PrivateAttr
+from redis.commands.search.aggregation import AggregateRequest, AggregateResult, Reducer
+from redis.exceptions import ResponseError


Unused imports here (e.g., AggregateResult, Reducer) will trigger pylint failures and should be removed or used. Consider importing only AggregateRequest from redis.commands.search.aggregation.

Suggested change

from redis.exceptions import ResponseError

Copilot · 2026-02-17T00:52:32Z

redisvl/extensions/llm_router/router.py

+    PretrainedTier,
+    RoutingConfig,
+)
+from redisvl.extensions.router.schema import Route, SemanticRouterIndexSchema


Route is imported but never used; this will fail pylint’s unused-import check. Remove the Route import (keep SemanticRouterIndexSchema).

Suggested change

from redisvl.extensions.router.schema import Route, SemanticRouterIndexSchema

from redisvl.extensions.router.schema import SemanticRouterIndexSchema

Copilot · 2026-02-17T00:52:32Z

redisvl/extensions/llm_router/router.py

+            schema=schema,
+            redis_client=redis_client,
+        )
+        index.create(overwrite=True, drop=False)


from_pretrained() unconditionally recreates the index with overwrite=True. This is destructive if an index with the same name already exists in the target Redis. Consider adding an overwrite: bool = False parameter (mirroring __init__) and erroring when the index exists unless overwrite is explicitly requested.

Suggested change

index.create(overwrite=True, drop=False)

index.create(overwrite=False, drop=False)

Copilot · 2026-02-17T00:52:32Z

redisvl/extensions/llm_router/router.py

+        overwrite: bool = False,
+        cost_optimization: bool = False,
+        connection_kwargs: Dict[str, Any] = {},
+        **kwargs,


Async factory has the same mutable-default issue: connection_kwargs defaults to {}. Switch to None and instantiate a new dict inside create() to avoid shared state and pylint warnings.

Copilot · 2026-02-17T00:52:33Z

redisvl/extensions/llm_router/router.py

+        cost_optimization: bool = False,
+        connection_kwargs: Dict[str, Any] = {},
+        **kwargs,
+    ) -> "AsyncLLMRouter":


Async create() accepts **kwargs but never uses it, so extra keyword args are silently dropped. Either remove **kwargs or forward/merge them into the Redis connection kwargs (consistent with from_existing).

Copilot · 2026-02-17T00:52:33Z

redisvl/extensions/llm_router/router.py

+        overwrite: bool = False,
+        cost_optimization: bool = False,
+        connection_kwargs: Dict[str, Any] = {},
+        **kwargs,


connection_kwargs uses a mutable default ({}), which can leak state across instances and is flagged by pylint. Use None as the default and create a new dict inside the method (or use Field(default_factory=dict) style patterns).

bsbodden added 9 commits February 16, 2026 13:26

test(llm-router): simplify test assertions for semantic matching

1b7b0e1

- test_add_tier_references now verifies reference addition without strict routing - Cost optimization test uses query that better matches references - All 22 integration tests should now pass

docs(llm-router): add comprehensive DESIGN.md

91e8c99

- Problem statement and existing solution limitations - Architecture diagrams and key design decisions - API examples and comparison with SemanticRouter - Testing guide and future enhancements

Copilot AI review requested due to automatic review settings February 16, 2026 22:27

Copilot started reviewing on behalf of bsbodden February 16, 2026 22:27 View session

bsbodden force-pushed the llm-router branch from 0c13644 to fda6eb6 Compare February 16, 2026 22:31

Copilot AI reviewed Feb 16, 2026

View reviewed changes

redisvl/extensions/llm_router/router.py Show resolved Hide resolved

redisvl/extensions/llm_router/router.py Show resolved Hide resolved

redisvl/extensions/llm_router/router.py Show resolved Hide resolved

bsbodden added the experimental label Feb 16, 2026

bsbodden requested review from abrookins and tylerhutcherson February 16, 2026 23:19

Copilot AI review requested due to automatic review settings February 17, 2026 00:45

Copilot started reviewing on behalf of bsbodden February 17, 2026 00:45 View session

bsbodden self-assigned this Feb 17, 2026

Copilot AI reviewed Feb 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: LLM Router extension for cost-optimized model selection#476

feat: LLM Router extension for cost-optimized model selection#476
bsbodden wants to merge 11 commits intomainfrom
llm-router

bsbodden commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Copilot AI Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	distance_threshold: float = 0.5
	distance_threshold: Annotated[float, Field(strict=True, gt=0, le=2)] = 0.5

	from redisvl.extensions.router.schema import Route, SemanticRouterIndexSchema
	from redisvl.extensions.router.schema import SemanticRouterIndexSchema

	index.create(overwrite=True, drop=False)
	index.create(overwrite=False, drop=False)

Conversation

bsbodden commented Feb 16, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant